Implement DNS SRV failover

Our DNS SRV failover support is only limited to TCP (or TLS) connect failure, which in this case pjsip will automatically retries the next server. But even then, there is no mechanism to flag that a server has been failing, which means that the next request may try the same server again and triggering the failover again.

What we’ve been suggesting is to implement the failover mechanism in the application layer. In this case, the application queries the list of available servers either with gethostbyname, DNS SRV, or by other means. It then specifies which server to use by putting the IP address as proxy parameter (i.e. Route header) in the account config. The mechanism to test the wellness of a server and when to initiate the failover is totally controlled by the application. The application can change which server to use by changing the account proxy setting with pjsua_acc_modify().

Warning

This IP-in-Route approach does not work for TLS. When an IP address is put in the proxy/Route URI, PJSIP uses that same value both as the TLS ClientHello SNI and as the name matched against the server certificate (CN/subjectAltName). Sending an IP literal as SNI is invalid per RFC 6066, and the certificate name check typically fails because certificates usually carry the server hostname, not its IP (unless the cert includes an iPAddress subjectAltName entry). The result is a failed handshake or a rejected certificate.

The underlying reason is that the TLS transport derives the SNI and the certificate-validation name from a single field (the next-hop URI host), so overriding the connect address with an IP also overrides the TLS name. For UDP and TCP there is no such name check, so the IP-in-Route approach above works as-is.

Failover with TLS

To do application-controlled failover over TLS, keep the hostname in the proxy/Route URI (so SNI and certificate validation use the correct name) and select the actual server address separately. A PJSUA application can do this with an external resolver (current releases) or with server affinity (the release after 2.17); an application working directly at the PJSIP layer can use the transport API.

Using an external resolver (PJSUA)

Register an external resolver with pjsip_endpt_set_ext_resolver(). Because the next-hop hostname is recorded before resolution runs, dest_info.name stays the hostname — so SNI and certificate validation remain correct — while your resolver callback decides which address(es) to return for that hostname, in what order, and which failing server to exclude. PJSIP then uses the returned addresses (and retries the next one on connect failure). This works with PJSUA on current releases and keeps all failover policy in the application.

The resolver is just a single resolve callback — typically a thin wrapper over pj_getaddrinfo() (or your own server table) that returns the addresses in the order your failover logic prefers:

/* Turn the next-hop hostname into an address list, in your preferred
 * order. dest_info.name already holds the hostname, so SNI/cert are
 * unaffected by what is returned here.
 */
static void my_resolve(pjsip_resolver_t *res, pj_pool_t *pool,
                       const pjsip_host_info *target, void *token,
                       pjsip_resolver_callback *cb)
{
    pjsip_server_addresses svr;
    pj_addrinfo ai[8];
    unsigned i, cnt = PJ_ARRAY_SIZE(ai);
    pj_status_t status;

    pj_bzero(&svr, sizeof(svr));

    /* Resolve the hostname (getaddrinfo/DNS SRV), or consult your own
     * server table here and drop/reorder the servers you think are down.
     */
    status = pj_getaddrinfo(pj_AF_UNSPEC(), &target->addr.host, &cnt, ai);
    if (status != PJ_SUCCESS) {
        cb(status, token, &svr);   /* svr.count == 0: nothing resolved */
        return;
    }

    for (i = 0; i < cnt; ++i) {
        pj_sockaddr_cp(&svr.entry[i].addr, &ai[i].ai_addr);
        pj_sockaddr_set_port(&svr.entry[i].addr,
                             (pj_uint16_t)(target->addr.port ?
                                           target->addr.port : 5061));
        svr.entry[i].addr_len = pj_sockaddr_get_len(&svr.entry[i].addr);
        svr.entry[i].type     = target->type;   /* preserve transport (TLS) */
        svr.entry[i].name     = target->addr.host;
    }
    svr.count = cnt;

    cb(status, token, &svr);   /* hand the ordered list back to PJSIP */
}

/* Install once, e.g. right after pjsua_init(). */
pjsip_ext_resolver ext;
pj_bzero(&ext, sizeof(ext));
ext.resolve = &my_resolve;
pjsip_endpt_set_ext_resolver(pjsua_get_pjsip_endpt(), &ext);

Note

The callback does not have to do the lookup itself with pj_getaddrinfo(). It can instead drive PJSIP’s built-in DNS resolver — obtained via pjsip_endpt_get_resolver() (a pj_dns_resolver) — to perform the actual SRV/A resolution, then reorder or filter the results before calling cb. That keeps PJSIP’s DNS handling (SRV priority/weight, caching) while still letting the application control failover order. Note the resolution is asynchronous in that case: call cb from the DNS query completion, not inline.

Using server affinity (PJSUA)

Note

Server affinity is not yet part of a released PJSIP version; it will ship in the next release after 2.17. On current releases use the external resolver above. See Account-Scoped Server Affinity for the full feature guide.

Enable pjsua_acc_config::server_affinity on the account and pin the chosen server address with pjsua_acc_set_affinity_addr(). The hostname in proxy[0] (or reg_uri) is used for SNI and certificate validation, while the address you pass is used as the actual connect target:

pjsua_acc_config cfg;
pjsua_acc_config_default(&cfg);

/* Hostname here drives SNI + cert validation, NOT the connect target. */
cfg.proxy_cnt = 1;
cfg.proxy[0]  = pj_str("sip:sip.example.com;transport=tls;lr");
cfg.reg_uri   = pj_str("sip:example.com");

/* Enable affinity so the pinned transport is reused across requests.
 * Leave cfg.transport_id == PJSUA_INVALID_ID: a fixed transport_id
 * bypasses affinity.
 */
cfg.server_affinity = PJSUA_SERVER_AFFINITY_ENABLED;

pjsua_acc_add(&cfg, PJ_TRUE, &acc_id);

/* Pick the server IP yourself (your failover decision) and pin it.
 * Connects to 198.51.100.10:5061 but presents/validates the TLS name
 * as "sip.example.com".
 */
pj_sockaddr addr;
pj_sockaddr_init(pj_AF_INET(), &addr, NULL, 5061);
pj_inet_pton(pj_AF_INET(), &pj_str("198.51.100.10"),
             &addr.ipv4.sin_addr);

pjsua_acc_set_affinity_addr(acc_id, &addr);

On failover, call pjsua_acc_set_affinity_addr() again with the next server’s address; the hostname (and therefore SNI and certificate validation) stays correct. Note that on transport reuse the per-request hostname recheck is skipped (trust is asserted at handshake), which is safe here because the handshake used the correct hostname.

Note

Server affinity is not TLS-specific: it pins the chosen address for UDP and TCP as well (TCP/TLS via the transport selector, UDP via a hidden Route header). It is the recommended way to do application-controlled failover for any transport; for TLS it additionally keeps the SNI and certificate name correct.

Using the transport API directly (PJSIP layer)

At the PJSIP level the connect address and the TLS name are independent inputs to pjsip_endpt_acquire_transport2(): the addr argument is the socket connect target, while tdata->dest_info.name is the name used for SNI and certificate validation. Acquire the transport with the IP as addr and the hostname in dest_info.name. The call only reads dest_info from the tdata (to learn the connect target and the TLS name) and does not retain or reference-count it, so a throwaway zero-initialized pjsip_tx_data is sufficient here — this mirrors the pattern PJSIP uses internally:

pjsip_tx_data dummy;
pj_bzero(&dummy, sizeof(dummy));
pj_strdup2(pool, &dummy.dest_info.name, "sip.example.com");

pjsip_endpt_acquire_transport2(endpt, PJSIP_TRANSPORT_TLS,
                               &ip_addr, addr_len,  /* connect target */
                               tp_sel, &dummy, &tp);

To actually send on that connection, set a transport selector of type PJSIP_TPSELECTOR_TRANSPORT (sel->u.transport = the acquired transport) on the outgoing pjsip_tx_data before sending.

Note

This is a PJSIP-layer technique and is not reachable from a plain PJSUA application. PJSUA’s per-account transport binding (pjsua_acc_config::transport_id / pjsua_acc_set_transport()) selects only the local listener for connection-oriented transports (TCP/TLS), not a specific connected transport, so it cannot pin the destination. Use this approach when your application manages its own dialogs and transactions and can set tdata->tp_sel directly. PJSUA applications should use the external resolver (current releases) or server affinity (the release after 2.17) above instead.