Audio/Video Synchronization

Tip

PJSUA-LIB readers — symbol equivalents are listed at the bottom of this page. The page is mostly PJMEDIA-level content (pjmedia_av_sync), which is the same regardless of which higher-level API you use.

When a session carries audio and video together, the two streams travel and decode through independent pipelines (jitter buffer, decoder, renderer, file demuxer, …) and accumulate independent delays. Without intervention, the streams drift apart and the speaker’s mouth no longer matches their voice. PJMEDIA’s inter-media synchronizer (pjmedia_av_sync, in pjmedia/av_sync.h) aligns the presentation timestamps of all media in the same session so they stay in sync.

The synchronizer is timeline-agnostic: it just needs each participating media to report (a) periodic reference points that anchor the media’s local timestamp to a common wall-clock-style time, and (b) per-frame presentation timestamps. The most common source of reference points is RTCP Sender Report (NTP + RTP timestamp pairs), but the same mechanism is used for non-RTP timelines too — see the AVI player below.

Default behaviour in PJSUA-LIB

For PJSUA-LIB applications, a per-call synchronizer is set up automatically and no API calls are required for the standard case:

  • When a call has two or more media streams (typically one audio + one video), PJSUA-LIB creates a pjmedia_av_sync for the call and registers each stream with it. Subsequent re-INVITE/UPDATEs that add streams reuse the same synchronizer; the synchronizer is destroyed when the call ends.

  • The synchronizer is created in streaming mode (pjmedia_av_sync_setting::is_streaming set to PJ_TRUE), which smooths the delay-adjustment values so already-running media don’t see surprise increases in delay.

  • Each stream calls pjmedia_av_sync_update_ref() whenever it receives an RTCP SR packet (the SR’s NTP and RTP timestamps become the reference point), and calls pjmedia_av_sync_update_pts() for every frame it returns to its sink, acting on the adjust_delay output to speed up or slow down.

Net effect: in a typical audio + video call, lipsync just works — no application code is required.

Opting out per call

To disable inter-media synchronization on a specific call, set the PJSUA_CALL_NO_MEDIA_SYNC flag (value 256 in pjsua_call_flag) in CallSetting::flag. PJSUA-LIB will skip synchronizer creation, or destroy an existing one if the flag is set on a re-INVITE/UPDATE.

CallOpParam prm(true);
prm.opt.flag |= PJSUA_CALL_NO_MEDIA_SYNC;

call.makeCall("sip:peer@example.com", prm);

You normally don’t want this — disabling sync trades lipsync for streams running independently at their own rates. Reasons one might flip it on: instrumented testing where you want raw decoder output, or a non-AV use case (e.g. a call carrying two video streams with no audio reference) where the synchronizer’s reasoning doesn’t apply.

AVI playback

The AVI player (pjmedia_avi_player_create_streams()) creates its own synchronizer per file so the audio and video tracks of the file stay aligned during playback. There is no RTP and no RTCP SR involved here; the player anchors each track’s timeline at zero with pjmedia_av_sync_update_ref() (NTP=0, RTP-ts=0) at file open and again on every rewind/EOF, then drives pjmedia_av_sync_update_pts() from each frame’s PTS as the file is read out.

A few differences from the PJSUA-LIB call case:

  • The synchronizer is created with default settings — is_streaming is left PJ_FALSE, since the file itself is the authoritative source of timing and there is no live network jitter to smooth against.

  • Synchronization can be disabled at file-open time by passing the PJMEDIA_AVI_FILE_NO_SYNC option to pjmedia_avi_player_create_streams().

This is invisible to PJSUA-LIB applications using the AVI device — they just see properly lipsynced AVI playback into a call.

How the synchronization works

Conceptually, the synchronizer maintains a per-media estimate of the lag between that stream’s presented frames and the earliest media’s presented frames. The lag is computed from two inputs:

  • Reference points supplied through update_ref(ntp, ts). Each call records that timestamp ts on the media’s local clock corresponds to wall-clock time ntp. RTP streams take the pair from incoming RTCP SR; the AVI player anchors at (0, 0); a custom pipeline can supply whatever pair makes its timeline meaningful.

  • Per-frame presentation timestamps supplied through update_pts(pts). The synchronizer converts pts to wall-clock using the latest reference point and compares against the other media’s most recent presented wall-clock.

When media drift, the synchronizer prefers to speed up the lagging media rather than slow down the leading one (no extra buffering is added). It tries this for up to PJMEDIA_AVSYNC_MAX_SPEEDUP_REQ_CNT requests (default 10). If after that the lag still exceeds PJMEDIA_AVSYNC_MAX_TOLERABLE_LAG_MSEC (default 45 ms), it switches to slowing down the leading media to let the laggard catch up.

The adjust_delay value returned by update_pts is in milliseconds: 0 means in-sync, positive means the stream should add delay, negative means it should drop delay (or skip a frame).

Both constants are compile-time tunables defined in pjmedia/config.h; raise the tolerable-lag threshold if your deployment legitimately runs with larger jitter (e.g. a flaky network plus generous jitter buffers) and the synchronizer is fighting it.

Direct PJMEDIA API (custom pipelines)

Applications that build their own media pipeline (using PJMEDIA directly, without going through PJSUA-LIB or the AVI player) drive the synchronizer themselves. The lifecycle:

  1. Create with pjmedia_av_sync_create(). Call pjmedia_av_sync_setting_default() first to pre-fill defaults, then set is_streaming = PJ_TRUE for live streams (skip it for file/clip playback).

  2. Register each media with pjmedia_av_sync_add_media(), passing a pjmedia_av_sync_media_setting that names the media, its type (PJMEDIA_TYPE_AUDIO or PJMEDIA_TYPE_VIDEO), and its clock rate. Keep the returned pjmedia_av_sync_media handle.

  3. At each new reference point (RTCP SR for RTP, (0,0) for files at open / rewind, etc.), call pjmedia_av_sync_update_ref().

  4. On each frame about to be presented, call pjmedia_av_sync_update_pts() and act on the adjust_delay output (positive: add delay, negative: speed up or skip).

  5. On media removal / session teardown, call pjmedia_av_sync_del_media() then pjmedia_av_sync_destroy().

pjmedia_av_sync_reset() clears the running per-media state without removing the registered media — useful on a re-INVITE/UPDATE that significantly changes the topology, or on a file rewind.

PJSUA-LIB equivalents

Most of this page is PJMEDIA-level (pjmedia_av_sync_*) and applies to both APIs unchanged. The only PJSUA2-specific symbols above are:

PJSUA2

PJSUA-LIB

CallOpParam::opt (CallSetting) — .flag

pjsua_call_setting::flag initialised via pjsua_call_setting_default()

pj::Call::makeCall()

pjsua_call_make_call()