Audio Frame Manipulation
Tip
PJSUA-LIB and PJMEDIA-only readers — alternatives are listed at the bottom of this page.
Custom audio processing — applying a filter, feeding ML models, muxing into another stream, recording to an unusual format, or just inspecting the PCM passing through — needs application-level access to raw audio frames. PJSIP exposes this access at three layers, with different trade-offs in placement, capability, and ease of use.
If the underlying media-flow model (ports, get_frame() /
put_frame(), conference bridge) is unfamiliar, read
Understanding Audio Media Flow first.
PJSUA2 — AudioMediaPort
The cleanest path for PJSUA2 (and SWIG-bound languages — Java, C#,
Python, Kotlin) is to subclass pj::AudioMediaPort and
override its two callbacks:
pj::AudioMediaPort::onFrameRequested()— invoked when the bridge needs an outbound frame from your port (you fill in the buffer to push data downstream).pj::AudioMediaPort::onFrameReceived()— invoked when the bridge delivers an inbound frame to your port (you read the buffer to consume data from upstream).
Both callbacks receive a pj::MediaFrame carrying the
frame type (typically PJMEDIA_FRAME_TYPE_AUDIO), a buf
ByteVector, and a size. The port participates in the
conference bridge like any other AudioMedia: register it once,
then startTransmit / stopTransmit to wire it to your call’s
audio media, the sound device, or any other source / sink.
Available since PJSIP 2.14 (#3569).
Defining the port
class MyAudioPort : public AudioMediaPort
{
virtual void onFrameRequested(MediaFrame &frame) override
{
// Fill frame.buf with up to frame.size bytes of audio.
frame.type = PJMEDIA_FRAME_TYPE_AUDIO;
// frame.buf.assign(frame.size, '\0'); // example: silence
}
virtual void onFrameReceived(MediaFrame &frame) override
{
// frame.buf and frame.size carry the inbound audio.
// Inspect, copy into a queue for an ML model, write to a
// file, etc. Keep the handler short — this runs on the
// conference bridge clock thread.
}
};
Creating and wiring it
MyAudioPort *port = new MyAudioPort();
MediaFormatAudio fmt;
fmt.init(PJMEDIA_FORMAT_PCM,
16000, // clock rate
1, // channel count
20000, // frame time in microseconds (20 ms)
16); // bits per sample
port->createPort("my_audio_port", fmt);
// Wire to a call's audio media, in either or both directions:
port->startTransmit(callAudio); // we feed the call
callAudio.startTransmit(*port); // we receive the call's audio
The bridge handles clock-rate / channel-count / frame-size conversion between connected ports, so the port’s format only has to be self-consistent — it doesn’t have to match the call’s codec.
Threading
The two callbacks run on the conference bridge’s get-frame thread (typically the sound device thread or a clock thread; see Understanding Audio Media Flow). Keep them short and non-blocking — long handlers stall the entire bridge tick. Pass heavier work (file I/O, network calls, ML inference) to your own thread via a queue.
Sample
A working PJSUA2 example lives at
pjsip-apps/src/samples/pjsua2_demo.cpp (search for
MyAudioMediaPort). The Python equivalent is in
pjsip-apps/src/swig/python/test.py.
C / PJSUA-LIB / PJMEDIA alternatives
PJSUA2’s AudioMediaPort was added in 2.14 and is the
recommended path. C-only applications, or apps that pre-date that
release, have three alternatives — listed by progressively more
setup but more capability.
PJSUA-LIB sound-device hooks
Two callbacks on pjsua_media_config give read access to
audio frames at the sound-device boundary:
pjsua_media_config::on_aud_prev_rec_frame— every microphone frame, before any media processing (echo canceller, AGC, noise suppression).pjsua_media_config::on_aud_prev_play_frame— every playback frame, right before it’s queued to the speaker.
Set them in the pjsua_media_config you pass to
pjsua_init():
static void on_rec(pjmedia_frame *frame)
{
/* frame->buf, frame->size — read-mostly. */
}
pjsua_media_config med_cfg;
pjsua_media_config_default(&med_cfg);
med_cfg.on_aud_prev_rec_frame = &on_rec;
pjsua_init(&ua_cfg, &log_cfg, &med_cfg);
Caveats:
The callbacks fire on the sound-device thread. No blocking, no PJSUA API calls that could lock, no audio-device switching, no
pjsua_set_ec().They expose a single shared point in the audio pipeline — every call’s audio mixes into the same playback stream you see here.
Modifying the audio is not safe when software echo cancellation is active — the EC trains on the unmodified data, so changes would degrade or break it.
Use these when you need cheap, application-wide observation (logging energy levels, dumping raw audio for debug, simple metrics).
Custom pjmedia_port
For full bidirectional access at the conference-bridge level
(equivalent to PJSUA2’s AudioMediaPort), implement a
pjmedia_port with your own put_frame / get_frame
function pointers, then register it with
pjsua_conf_add_port():
static pj_status_t my_put_frame(pjmedia_port *this_port, pjmedia_frame *frame)
{
/* Inbound: bridge wrote a frame into us. */
return PJ_SUCCESS;
}
static pj_status_t my_get_frame(pjmedia_port *this_port, pjmedia_frame *frame)
{
/* Outbound: bridge wants a frame from us. Fill frame->buf. */
frame->type = PJMEDIA_FRAME_TYPE_AUDIO;
return PJ_SUCCESS;
}
pjmedia_port *port = pj_pool_zalloc(pool, sizeof(pjmedia_port));
pjmedia_port_info_init(&port->info, &name,
PJMEDIA_SIG_CLASS_PORT_AUD('m','p'),
16000, 1, 16, 320);
port->put_frame = &my_put_frame;
port->get_frame = &my_get_frame;
pjsua_conf_port_id slot;
pjsua_conf_add_port(pool, port, &slot);
/* slot is now usable like any other bridge port. */
Same threading rules apply — the function pointers run on the bridge clock thread.
Direct PJMEDIA audio device
For applications that do not run a SIP stack at all — pure media
processing — use pjmedia_aud_stream_create() with
pjmedia_aud_rec_cb and pjmedia_aud_play_cb
callbacks. This bypasses both PJSUA-LIB and the conference bridge
and gives raw access to capture / playback frames at the audio
device level.
This is the lowest-level option and the most flexible (no conference-bridge involvement at all), but you take on all buffering, format conversion, and routing yourself.
Cross-cutting tools
For interception at a different layer of the stack, see also:
Transport Adapter — wraps the RTP transport so application code can intercept or rewrite packets after encoding (network-side). A different problem space: these are encoded RTP payloads, not raw PCM frames.
PJSUA-LIB / PJMEDIA equivalents
PJSUA2 |
PJSUA-LIB / PJMEDIA |
|---|---|
|
custom |
|
|
|
|
(no PJSUA2 equivalent) |
|
(no PJSUA2 equivalent) |
|