DTMF

Tip

This page is PJSUA2-first: prose, examples, and symbol names use the PJSUA2 (C++) API, with PJSUA-LIB equivalents in the footer table.

Overview

PJSIP supports three DTMF transports for outgoing digits and two for incoming digits:

Method

Send

Receive

RTP events (RFC 4733, originally RFC 2833)

Yes

Yes

SIP INFO

Yes

Yes

Inband audio tones

Yes

No (see Inband detection for a wiring sketch)

RFC 4733 (telephone-event) is the recommended method. It carries events in-band with RTP, doesn’t burden the signalling path, and is the standards-based choice. PJSIP only sends RFC 4733 events when the peer’s SDP advertises the capability with a line like:

a=rtpmap:101 telephone-event/8000

Without that line on the negotiated SDP, attempting to send via RFC 4733 returns PJMEDIA_RTP_EREMNORFC2833 and the call falls back to whatever fallback path the application implements (commonly: SIP INFO).

SIP INFO is non-standard (see RFC 6086#section-2), expensive — every keystroke becomes a SIP transaction — and the body format isn’t universally agreed (some PBXes use application/dtmf, some use application/dtmf-relay; PJSIP sends and parses dtmf-relay). Use only when RFC 4733 is unavailable. See #2036 for background.

Inband audio tones are useful for codecs that don’t preserve DTMF through transcoding (e.g. some legacy gateways), but PJSIP does not implement inband DTMF detection — only generation. See Sending inband DTMF tones and Implementing an inband DTMF detector for details.

Sending DTMF

The modern API is pj::Call::sendDtmf(), which takes a pj::CallSendDtmfParam carrying the method, duration, and digit string. It supersedes the legacy Call::dialDtmf() (the C pjsua_call_dial_dtmf* family), which was RFC 4733 only.

try {
    CallSendDtmfParam prm;
    prm.method   = PJSUA_DTMF_METHOD_RFC2833;
    prm.duration = 200;       // milliseconds; 0 to use default
    prm.digits   = "12345";
    call.sendDtmf(prm);
} catch (Error& err) {
    // PJMEDIA_RTP_EREMNORFC2833 if peer didn't advertise the
    // telephone-event rtpmap — fall back to SIP INFO here.
}

The default duration is PJSUA_CALL_SEND_DTMF_DURATION_DEFAULT, in turn derived from PJMEDIA_DTMF_DURATION_MSEC (200 ms by default; see #3540). Passing 0 selects the default.

When method is PJSUA_DTMF_METHOD_RFC2833, sendDtmf enqueues the digits in the audio stream’s DTMF send queue and returns immediately; the digits go out one by one at the rate dictated by duration plus a configured pause. When method is PJSUA_DTMF_METHOD_SIP_INFO, each digit produces one SIP INFO request immediately.

Method-selection trade-offs:

  • RFC 4733 — recommended. Cheap (in-band with RTP), standards-based, provides start / update / end semantics for the receiver. Requires peer-advertised capability in SDP.

  • SIP INFO — fall-back when RFC 4733 is unavailable. Non-standard, expensive (one SIP transaction per digit), interop-fragile. Per the PJSUA_DTMF_METHOD_SIP_INFO doxygen: if the remote doesn’t support SIP INFO and never replies, the sender treats it as a transaction timeout and disconnects the call.

  • Inband — last-resort for paths that strip RFC 4733 / INFO. Plays audio tones over the existing media channel. See Sending inband DTMF tones.

Send-queue management

For RFC 4733, sendDtmf doesn’t block: digits queue and play out at the configured cadence. To inspect backpressure (e.g. before queuing more digits from a dial-pad), PJSUA-LIB exposes pjsua_call_get_queued_dtmf_digits() (#4645):

unsigned queued = 0;
pjsua_call_get_queued_dtmf_digits(call_id, &queued);

Returns the number of digits still pending transmission. Only available in PJSUA-LIB; can be invoked directly from PJSUA2 C++ apps.

Tuning RFC 4733 transmission

Note

The API in this section lives in PJMEDIA (no PJSUA2 wrapper). The pjmedia_stream * pointer is delivered to the application via pj::Call::onStreamCreated() (PJSUA2) / pjsua_callback::on_stream_created (PJSUA-LIB); stash it when the callback fires and invoke the PJMEDIA functions against it.

For finer control over how RFC 4733 events are emitted on the wire, pjmedia_stream_set_tx_dtmf_options() (#4663) configures the per-stream send parameters:

pjmedia_stream_set_tx_dtmf_options(
    stream,                     /* The audio stream */
    200,                        /* duration_ms (0..1000)              */
    50,                         /* pause_ms between events (0..1000)  */
    101,                        /* payload type                       */
    0,                          /* volume in dBm0 (-63..0; 0 = max)   */
    2                           /* extra end-bit repetitions (0..7)   */
);

Knobs:

  • ``duration_ms`` — single-event duration. Overrides the CallSendDtmfParam::duration and PJMEDIA_DTMF_DURATION_MSEC defaults at the stream level.

  • ``pause_ms`` — gap inserted between consecutive events when the send queue has more than one digit.

  • ``pt`` — RTP payload type. Defaults to whatever the SDP negotiation produced (typically 101). Override only when the peer insists on a non-default PT.

  • ``vol`` — RTP DTMF volume in dBm0; 0 is loudest, -63 quietest.

  • ``ebit_rep_cnt`` — extra repetitions of the end-bit packet (the packet that signals “key released”). The total number of end-bit packets transmitted is 1 + ebit_rep_cnt. RFC 4733 §2.5.1.3 recommends three total; PJSIP defaults to ebit_rep_cnt = 2 (i.e. three end-bit packets) — see DTMF_EBIT_RETRANSMIT_CNT in pjmedia/src/pjmedia/stream.c and #1582. Bump on lossy UDP paths.

DTMF flash (event 16)

RFC 4733 reserves event 16 for the on-hook flash signal. PJSIP encodes flash with the literal character 'R' in the digits string (per the pjsua_call_dial_dtmf doxygen and RFC 4730). Compile-time toggle PJMEDIA_HAS_DTMF_FLASH (default on; #1734) gates the feature.

CallSendDtmfParam prm;
prm.method   = PJSUA_DTMF_METHOD_RFC2833;
prm.duration = 200;
prm.digits   = "R";          // single flash
call.sendDtmf(prm);

Receivers see the flash as digit ASCII 'R' (event code 16).

Receiving DTMF

PJSUA2 exposes two virtual callbacks on pj::Call:

  • pj::Call::onDtmfDigit() — fires for each digit; the pj::OnDtmfDigitParam carries the method (RFC 4733 vs SIP INFO), digit, and total duration in milliseconds (or PJSUA_UNKNOWN_DTMF_DURATION).

  • pj::Call::onDtmfEvent() — fires for every RTP event packet (including update packets while a key is held), giving the application start / update / end semantics. Carries an extra flags field with PJMEDIA_STREAM_DTMF_IS_UPDATE / PJMEDIA_STREAM_DTMF_IS_END bits.

The most-specific implemented callback wins. onDtmfEvent — when implemented — fully suppresses onDtmfDigit, regardless of method (verified at pjsua_aud.c:746-757). Pick the callback that matches what your dial-plan needs:

  • Single shot per digit — implement onDtmfDigit. Easiest for IVR-style dial-pads.

  • Progressive updates (long-press, hold-to-flash, KPML-like UX) — implement onDtmfEvent and watch the flag bits.

class MyCall : public Call {
public:
    using Call::Call;

    void onDtmfDigit(OnDtmfDigitParam &prm) override
    {
        cout << "DTMF digit '" << (char)prm.digit << "'"
             << " via " << (prm.method == PJSUA_DTMF_METHOD_RFC2833
                            ? "RFC 4733" : "SIP INFO")
             << ", duration=" << prm.duration << " ms" << endl;
    }
};

For the event-based callback, distinguish phases by inspecting prm.flags:

void onDtmfEvent(OnDtmfEventParam &prm) override
{
    if (prm.flags & PJMEDIA_STREAM_DTMF_IS_END) {
        // Final indication; prm.duration is the total length (or
        // PJSUA_UNKNOWN_DTMF_DURATION). End indications can be
        // lost — applications should not require one for every
        // event.
    } else if (prm.flags & PJMEDIA_STREAM_DTMF_IS_UPDATE) {
        // Same digit, updated duration so far. The key is still
        // pressed.
    } else {
        // First indication for this digit.
    }
}

For SIP INFO digits, onDtmfEvent fires once per digit with PJMEDIA_STREAM_DTMF_IS_END set (verified at pjsua_call.c:6749-6770) — SIP INFO has no incremental update mechanism.

SDP / capability negotiation

PJSIP advertises telephone-event in outgoing offers when PJMEDIA_RTP_PT_TELEPHONE_EVENTS is non-zero (default 120). A typical offer looks like:

m=audio 4000 RTP/AVP 0 8 120
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:120 telephone-event/8000
a=fmtp:120 0-15

The exact payload type negotiated may differ — many peers (Asterisk, Cisco, FreeSWITCH defaults) use 101, and PJSIP picks whatever PT survives negotiation. When the remote answers with a matching rtpmap, RFC 4733 send is enabled. When the remote omits it, sendDtmf with method PJSUA_DTMF_METHOD_RFC2833 returns PJMEDIA_RTP_EREMNORFC2833.

To detect the negotiated PT (e.g. when a peer uses something other than the default), inspect pjmedia_stream_info::tx_event_pt after the call goes active. Override the per-stream PT via pjmedia_stream_set_tx_dtmf_options() if you need to match a specific peer.

Two compile-time settings shape what PJSIP advertises:

  • ``PJMEDIA_RTP_PT_TELEPHONE_EVENTS`` (default 120) — the starting payload type for telephone-event entries in outgoing SDP. Setting this to 0 disables telephone-event advertisement entirely (i.e. PJSIP will not offer or accept RFC 4733 events on any call).

  • ``PJMEDIA_TELEPHONE_EVENT_ALL_CLOCKRATES`` (default 1; #2088) — when enabled, PJSIP offers one telephone-event/<rate> rtpmap per audio-codec clock rate advertised in the offer (e.g. one for 8 kHz PCMU/PCMA, another for 16 kHz AMR-WB or Speex/16000, another for 48 kHz Opus). When disabled, only a single 8 kHz entry is offered (legacy behaviour before #2088). The all-clockrates default is a safety belt for strict peers that insist on a matching telephone-event clockrate; in practice most endpoints accept RFC 4733 events even when only the 8 kHz entry is offered. Trade-offs of leaving it on: larger SDP, and more dynamic-PT slots (96-127) consumed by telephone-event entries — usually a non-issue, but worth knowing for footprint-constrained deployments offering many codecs.

Sending inband DTMF tones

Below are steps to send inband DTMF tones:

  1. Once the call is established, create an instance of Multi-frequency/DTMF Tone Generator.

  2. Register this tone generator to pjsua’s conference bridge with pjsua_conf_add_port().

  3. Connect the tone generator to the call, with pjsua_conf_connect().

  4. Now instruct the tone generator to play some DTMF digits with pjmedia_tonegen_play_digits(). The digits then will be streamed to the call, and remote endpoint should receive the DTMF tone inband.

Below is the snippet to do it:

struct my_call_data
{
     pj_pool_t          *pool;
     pjmedia_port       *tonegen;
     pjsua_conf_port_id  toneslot;
};

struct my_call_data *call_init_tonegen(pjsua_call_id call_id)
{
     pj_pool_t *pool;
     struct my_call_data *cd;
     pjsua_call_info ci;

     pool = pjsua_pool_create("mycall", 512, 512);
     cd = PJ_POOL_ZALLOC_T(pool, struct my_call_data);
     cd->pool = pool;

     pjmedia_tonegen_create(cd->pool, 8000, 1, 160, 16, 0, &cd->tonegen);
     pjsua_conf_add_port(cd->pool, cd->tonegen, &cd->toneslot);

     pjsua_call_get_info(call_id, &ci);
     pjsua_conf_connect(cd->toneslot, ci.conf_slot);

     pjsua_call_set_user_data(call_id, (void*) cd);

     return cd;
}

void call_play_digit(pjsua_call_id call_id, const char *digits)
{
     pjmedia_tone_digit d[16];
     unsigned i, count = strlen(digits);
     struct my_call_data *cd;

     cd = (struct my_call_data*) pjsua_call_get_user_data(call_id);
     if (!cd)
         cd = call_init_tonegen(call_id);

     if (count > PJ_ARRAY_SIZE(d))
         count = PJ_ARRAY_SIZE(d);

     pj_bzero(d, sizeof(d));
     for (i=0; i<count; ++i) {
        d[i].digit = digits[i];
        d[i].on_msec = 100;
        d[i].off_msec = 200;
        d[i].volume = 0;
     }

     pjmedia_tonegen_play_digits(cd->tonegen, count, d, 0);
}

void call_deinit_tonegen(pjsua_call_id call_id)
{
     struct my_call_data *cd;

     cd = (struct my_call_data*) pjsua_call_get_user_data(call_id);
     if (!cd)
        return;

     pjsua_conf_remove_port(cd->toneslot);
     pjmedia_port_destroy(cd->tonegen);
     pj_pool_release(cd->pool);

     pjsua_call_set_user_data(call_id, NULL);
}

The resources that were allocated above must be released once the call is disconnected, by implementing this in pjsua_callback::on_call_state callback:

static void on_call_state(pjsua_call_id call_id, pjsip_event *e)
{
     pjsua_call_info call_info;

     pjsua_call_get_info(call_id, &call_info);

     if (call_info.state == PJSIP_INV_STATE_DISCONNECTED) {
        call_deinit_tonegen(call_id);
     }
}

With the above snippet, calling call_play_digit() sends inband DTMF digits to the remote party.

Implementing an inband DTMF detector

Currently PJMEDIA lacks built-in tone detection routine. If a tone-detection routine is available, integrating it into the framework is straightforward:

  1. Wrap the routine as PJMEDIA Port so that it can be plugged into the media framework. The implementation would be similar to WAV recorder media port (pjmedia/src/pjmedia/wav_writer.c), but instead of writing to a WAV file, it would monitor the audio signal for tones and call some callback when a tone is detected.

  2. Once the tone-detector media port is implemented, add this media port to the conference bridge with pjsua_conf_add_port(), and connect the audio source to your tone detector with pjsua_conf_connect().

PJSUA-LIB equivalents

PJSUA2

PJSUA-LIB

pj::Call::sendDtmf() / pj::CallSendDtmfParam

pjsua_call_send_dtmf() / pjsua_call_send_dtmf_param

(no equivalent — PJSUA2 standardised on the param-struct form)

Legacy: pjsua_call_dial_dtmf() / pjsua_call_dial_dtmf2() (RFC 4733 only)

pj::Call::onDtmfDigit() / pj::OnDtmfDigitParam

pjsua_callback::on_dtmf_digit2 / pjsua_dtmf_info. Legacy ASCII-only pjsua_callback::on_dtmf_digit is suppressed when on_dtmf_digit2 is set.

pj::Call::onDtmfEvent() / pj::OnDtmfEventParam

pjsua_callback::on_dtmf_event / pjsua_dtmf_event

PJSUA-LIB only (invoke from C++)

pjsua_call_get_queued_dtmf_digits()

PJMEDIA only (invoke from C++)

pjmedia_stream_set_tx_dtmf_options() (per-stream duration / pause / PT / volume / end-bit reps)