AI Media Port

group PJMEDIA_AI_PORT

Media port that bridges conference bridge audio to real-time AI services over WebSocket.

This is an experimental media port that connects the PJMEDIA conference bridge to real-time AI services (e.g. OpenAI Realtime API) via WebSocket. It uses a pluggable backend abstraction since each vendor uses proprietary JSON events over RFC 6455.

Audio from the conference bridge (put_frame) is encoded (e.g. base64 JSON) and sent over WebSocket. Audio received from the AI service is decoded and returned via get_frame. The port operates at the backend’s native clock rate; the conference bridge handles any resampling.

Typedefs

typedef struct pjmedia_ai_port pjmedia_ai_port: Forward declarations.

Enums

enum pjmedia_ai_event_type

AI port event types.

Values:

enumerator PJMEDIA_AI_EVENT_CONNECTED: AI service connection established.

enumerator PJMEDIA_AI_EVENT_DISCONNECTED: AI service connection lost or closed.

enumerator PJMEDIA_AI_EVENT_TRANSCRIPT: Transcript text received from the AI service.

enumerator PJMEDIA_AI_EVENT_RESPONSE_START: AI response generation started.

enumerator PJMEDIA_AI_EVENT_RESPONSE_DONE: AI response generation completed.

enumerator PJMEDIA_AI_EVENT_SPEECH_STARTED: Speech detected in input audio (VAD).

enumerator PJMEDIA_AI_EVENT_SPEECH_STOPPED: End of speech detected in input audio (VAD).

Functions

void pjmedia_ai_port_param_default(pjmedia_ai_port_param *param)

Initialize AI port parameters with default values.

Parameters:: param – The parameters to initialize.

pj_status_t pjmedia_ai_port_create(pj_pool_t *pool, const pjmedia_ai_port_param *param, pjmedia_ai_port **p_ai_port)

Create an AI media port.

Parameters:

pool – Pool for initial allocations. The AI port will create its own internal pool.
param – Creation parameters.
p_ai_port – Pointer to receive the AI port instance.

Returns:

PJ_SUCCESS on success.

pjmedia_port *pjmedia_ai_port_get_port(pjmedia_ai_port *ai_port)

Get the pjmedia_port interface for connecting to the conference bridge.

Parameters:: ai_port – The AI port instance.
Returns:: The media port, or NULL on error.

pj_status_t pjmedia_ai_port_connect(pjmedia_ai_port *ai_port, const pj_str_t *url, const pj_str_t *auth_token)

Connect to the AI service asynchronously. The on_event callback will be called with CONNECTED or DISCONNECTED when complete.

Parameters:

ai_port – The AI port instance.
url – WebSocket URL (ws:// or wss://).
auth_token – Authentication token (e.g. API key).

Returns:

PJ_SUCCESS if the connect was initiated.

pj_status_t pjmedia_ai_port_disconnect(pjmedia_ai_port *ai_port)

Disconnect from the AI service gracefully.

Parameters:: ai_port – The AI port instance.
Returns:: PJ_SUCCESS on success.

void *pjmedia_ai_port_get_user_data(pjmedia_ai_port *ai_port)

Get the user data associated with the AI port.

Parameters:: ai_port – The AI port instance.
Returns:: The user data pointer.

void pjmedia_ai_port_set_user_data(pjmedia_ai_port *ai_port, void *user_data)

Set the user data associated with the AI port. If callbacks may be running concurrently, the caller should hold the port’s grp_lock (via pjmedia_ai_port_get_port()->grp_lock) when calling this function.

Parameters:

ai_port – The AI port instance.
user_data – The user data pointer.

pj_status_t pjmedia_ai_openai_backend_create(pj_pool_t *pool, pjmedia_ai_backend **p_backend)

Create an OpenAI Realtime API backend.

Parameters:

pool – Pool for allocations.
p_backend – Pointer to receive the backend instance.

Returns:

PJ_SUCCESS on success.

struct pjmedia_ai_event

#include <ai_port.h>

AI port event data delivered to the application callback.

Public Members

pjmedia_ai_event_type type: Specify the event type.

pj_status_t status: Specify the status code. PJ_SUCCESS for informational events, error code for DISCONNECTED.

pj_str_t text: Text payload (transcript, etc). Only valid for TRANSCRIPT events. The pointer is only valid for the duration of the callback.

struct pjmedia_ai_port_cb

#include <ai_port.h>

AI port application callback.

Public Members

void (*on_event)(pjmedia_ai_port *ai_port, const pjmedia_ai_event *event)

Called when an AI event occurs. This may be called from the ioqueue worker thread.

Param ai_port:: The AI port instance.
Param event:: The event data.

struct pjmedia_ai_port_param

#include <ai_port.h>

AI port creation parameters.

Public Members

pj_ioqueue_t *ioqueue: Specify the ioqueue to use for WebSocket async I/O. Required.

pj_timer_heap_t *timer_heap: Specify the timer heap to use for WebSocket timers. Required.

pjmedia_ai_port_cb cb: Specify the application callback, see pjmedia_ai_port_cb.

void *user_data: Specify application user data.

pjmedia_ai_backend *backend

Specify the AI backend instance. Required. Created by a backend factory (e.g. pjmedia_ai_openai_backend_create()). The port clock rate, channel count, and bits per sample are taken from the backend’s native settings.

The AI port takes ownership of the backend. The backend will be destroyed when the port is destroyed via pjmedia_port_destroy(). The caller must not use or destroy the backend after passing it.

unsigned ptime_msec

Specify the ptime in milliseconds.

Default value is 20.

pj_bool_t vad_enabled

Specify whether to enable client-side VAD on the TX (microphone) path. When enabled, silence frames are not sent over WebSocket, reducing bandwidth. The AI service’s server-side VAD (if any) still handles turn detection independently.

Default value is PJ_FALSE.

pj_ssl_sock_param *ssl_param

Specify the SSL/TLS parameters for wss:// connections. Set to NULL to use defaults. Ignored for ws:// connections.

The AI port makes an internal copy of this structure, so the caller’s pointer does not need to remain valid after pjmedia_ai_port_create() returns.

struct pjmedia_ai_backend_op

#include <ai_port.h>

Backend operation vtable.

Public Members

pj_status_t (*prepare_connect)(pjmedia_ai_backend *be, pj_pool_t *pool, const pj_str_t *auth_token, pj_websock_connect_param *cparam)

Prepare connect parameters (extra headers, subprotocol).

Param be:: The backend instance.
Param pool:: Pool for allocations.
Param auth_token:: Authentication token (e.g. API key).
Param cparam:: Connect parameters to fill.
Return:: PJ_SUCCESS on success.

pj_status_t (*on_ws_connected)(pjmedia_ai_backend *be, pj_websock *ws)

Called when WebSocket connection is established. The backend should send any session initialization messages.

Param be:: The backend instance.
Param ws:: The connected WebSocket.
Return:: PJ_SUCCESS on success.

pj_status_t (*encode_audio)(pjmedia_ai_backend *be, const pj_int16_t *samples, unsigned sample_count, char *buf, int *buf_len)

Encode PCM audio samples into the backend’s wire format (e.g. base64 JSON). The output is a text message to be sent via WebSocket.

Param be:: The backend instance.
Param samples:: PCM samples at backend native rate.
Param sample_count:: Number of samples.
Param buf:: Output buffer for the encoded message.
Param buf_len:: On entry, buffer capacity. On return, actual message length.
Return:: PJ_SUCCESS on success.

pj_status_t (*on_rx_msg)(pjmedia_ai_backend *be, pj_pool_t *pool, const void *data, pj_size_t len, pj_int16_t *audio_out, unsigned *sample_count, pjmedia_ai_event *event, const char **reply, pj_size_t *reply_len)

Parse a received WebSocket message. Extract audio samples and/or events. Optionally return a reply message to send back.

Param be:: The backend instance.
Param pool:: Temporary pool for parsing.
Param data:: Received message data.
Param len:: Message length.
Param audio_out:: Buffer for decoded PCM samples (may be NULL if no audio in this message).
Param sample_count:: On entry, capacity in samples. On return, number of decoded samples.
Param event:: Filled with event data if a non-audio event was received. Type set to -1 if no event.
Param reply:: If not NULL and the backend sets *reply to a non-NULL value, the AI port will send this string as a text WebSocket message. The data must remain valid until on_rx_msg returns.
Param reply_len:: Length of the reply message. Set to 0 if no reply is needed.
Return:: PJ_SUCCESS on success.

pj_status_t (*destroy)(pjmedia_ai_backend *be)

Destroy the backend and release resources.

Param be:: The backend instance.
Return:: PJ_SUCCESS on success.

struct pjmedia_ai_backend

#include <ai_port.h>

AI backend base structure. Backend implementations embed this as the first member.

Public Members

const pjmedia_ai_backend_op *op: Specify the backend operation vtable.

unsigned native_clock_rate: Specify the native clock rate of the AI service (e.g. 24000 for OpenAI).

unsigned native_channel_count: Specify the native channel count.

unsigned native_bits_per_sample: Specify the native bits per sample.

void *backend_data: Specify opaque backend-specific data.