Waveform Similarity Based Overlap-Add (WSOLA)

group PJMED_WSOLA

Time-scale modification to audio without affecting the pitch.

This section describes Waveform Similarity Based Overlap-Add (WSOLA) implementation in PJMEDIA. The WSOLA API here can be used both to compress (speed-up) and stretch (expand, slow down) audio playback without altering the pitch, or as a mean for performing packet loss concealment (WSOLA).

The WSOLA implementation is used by Adaptive Delay Buffer and Packet Lost Concealment (PLC).

Typedefs

typedef struct pjmedia_wsola pjmedia_wsola: Opaque declaration for WSOLA structure.

Enums

enum pjmedia_wsola_option

WSOLA options, can be combined with bitmask operation.

Values:

enumerator PJMEDIA_WSOLA_NO_HANNING: Disable Hanning window to conserve memory.

enumerator PJMEDIA_WSOLA_NO_PLC: Specify that the WSOLA will not be used for PLC.

enumerator PJMEDIA_WSOLA_NO_DISCARD: Specify that the WSOLA will not be used to discard frames in non-contiguous buffer.

enumerator PJMEDIA_WSOLA_NO_FADING: Disable fade-in and fade-out feature in the transition between actual and synthetic frames in WSOLA. With fade feature enabled, WSOLA will only generate a limited number of synthetic frames (configurable with pjmedia_wsola_set_max_expand()), fading out the volume on every more samples it generates, and when it reaches the limit it will only generate silence.

Functions

pj_status_t pjmedia_wsola_create(pj_pool_t *pool, unsigned clock_rate, unsigned samples_per_frame, unsigned channel_count, unsigned options, pjmedia_wsola **p_wsola)

Create and initialize WSOLA.

Parameters:

pool – Pool to allocate memory for WSOLA.
clock_rate – Sampling rate of audio playback.
samples_per_frame – Number of samples per frame.
channel_count – Number of channels.
options – Option flags, bitmask combination of pjmedia_wsola_option.
p_wsola – Pointer to receive WSOLA structure.

Returns:

PJ_SUCCESS or the appropriate error code.

pj_status_t pjmedia_wsola_set_max_expand(pjmedia_wsola *wsola, unsigned msec)

Specify maximum number of continuous synthetic frames that can be generated by WSOLA, in milliseconds. This option will only take effect if fading is not disabled via the option when the WSOLA session was created. Default value is PJMEDIA_WSOLA_MAX_EXPAND_MSEC (see also the documentation of PJMEDIA_WSOLA_MAX_EXPAND_MSEC for more information).

Parameters:

wsola – The WSOLA session
msec – The duration.

Returns:

PJ_SUCCESS normally.

pj_status_t pjmedia_wsola_destroy(pjmedia_wsola *wsola)

Destroy WSOLA.

Parameters:: wsola – WSOLA session.
Returns:: PJ_SUCCESS normally.

pj_status_t pjmedia_wsola_reset(pjmedia_wsola *wsola, unsigned options)

Reset the buffer contents of WSOLA.

Parameters:

wsola – WSOLA session.
options – Reset options, must be zero for now.

Returns:

PJ_SUCCESS normally.

pj_status_t pjmedia_wsola_save(pjmedia_wsola *wsola, pj_int16_t frm[], pj_bool_t prev_lost)

Give one good frame to WSOLA to be kept as reference. Application must continuously give WSOLA good frames to keep its session up to date with current playback. Depending on the WSOLA implementation, this function may modify the content of the frame.

Parameters:

wsola – WSOLA session.
frm – The frame, which length must match the samples per frame setting of the WSOLA session.
prev_lost – If application previously generated a synthetic frame with pjmedia_wsola_generate() before calling this function, specify whether that was because of packet lost. If so, set this parameter to PJ_TRUE to make WSOLA interpolate this frame with its buffer. Otherwise if this value is PJ_FALSE, WSOLA will just append this frame to the end of its buffer.

Returns:

PJ_SUCCESS normally.

pj_status_t pjmedia_wsola_generate(pjmedia_wsola *wsola, pj_int16_t frm[])

Generate one synthetic frame from WSOLA.

Parameters:

wsola – WSOLA session.
frm – Buffer to receive the frame.

Returns:

PJ_SUCCESS normally.

pj_status_t pjmedia_wsola_discard(pjmedia_wsola *wsola, pj_int16_t buf1[], unsigned buf1_cnt, pj_int16_t buf2[], unsigned buf2_cnt, unsigned *erase_cnt)

Compress or compact the specified buffer by removing some audio samples from the buffer, without altering the pitch. For this function to work, total length of the buffer must be more than twice erase_cnt.

Parameters:

wsola – WSOLA session.
buf1 – Pointer to buffer.
buf1_cnt – Number of samples in the buffer.
buf2 – Pointer to second buffer, if the buffer is not contiguous. Otherwise this parameter must be NULL.
buf2_cnt – Number of samples in the second buffer, if the buffer is not contiguous. Otherwise this parameter should be zero.
erase_cnt – On input, specify the number of samples to be erased. This function may erase more or less than the requested number, and the actual number of samples erased will be given on this argument upon returning from the function.

Returns:

PJ_SUCCESS if some samples have been erased, PJ_ETOOSMALL if buffer is too small to be reduced, PJ_EINVAL if any of the parameters are not valid.