Video Components and Backends

To enable end-to-end video in PJSIP, four pluggable components must be present in the build:

  1. Capture device — sources raw frames from a camera (or a virtual source such as an AVI file).

  2. Renderer device — displays decoded frames on screen.

  3. Video codec — encodes/decodes for transmission over RTP.

  4. Format converter — performs colour-space conversion and scaling between formats produced by capture devices, expected by codecs, and accepted by renderers.

In addition, several pieces of glue are always built-in and require no backend choice:

  • pjmedia_vid_port — connects a video device to a media port.

  • pjmedia_vid_conf — the video conference bridge that mixes/routes multiple video sources, and the routing fabric that PJSUA-LIB uses to wire capture, codec, and renderer slots together.

  • pjmedia_event — media event framework. Applications subscribe to it to receive video notifications such as format/resolution change, window resize/close, keyframe found/missing, capture orientation change, incoming RTCP feedback, and video device errors. See Media events below.

  • pjmedia_av_sync — inter-media synchronizer. Keeps audio and video in lipsync within a session by comparing NTP/RTP timestamps from RTCP SR reports and asking lagging or leading streams to adjust their delay.

  • RTP packetizers for H.263, H.264, and VPX (VP8/VP9).

Each of the four pluggable components is described below, with the available backends, their platform support, and the build flags that control them.

In the tables, ✓ means the backend is available on that platform, ✗ means it is not, and a backend in bold is the default on that platform when video is enabled.

Note

Availability also depends on whether the relevant third-party SDK (e.g. SDL, FFmpeg, OpenH264, libvpx) is installed and detected by the build system. See the build instructions linked at the end of this page.

Capture devices

Backend

Windows

macOS

Linux

iOS

Android

AVFoundation

Android Camera2

DirectShow

Video4Linux2

FFmpeg

AVI virtual device

Colorbar virtual

The AVI and Colorbar virtual devices are bundled and enabled by default; they are useful for testing without a real camera.

The AVI virtual device reads media from an AVI file on disk and presents it as a capture source. The underlying file reader (pjmedia_avi_player_create_streams()) accepts:

  • Video: raw I420 / IYUV / UYVY / YUY2 / DIB / RGB24 / RGB32, plus the compressed formats MJPEG, H.264, and MPEG-4 (also recognised under the FourCC aliases XVID / xvid / DIVX / FMP4 / DX50). Compressed AVIs require a matching codec for downstream processing — the conference bridge itself only handles raw video, so you must connect the player to a codec port (or send the bitstream directly into a call’s encoder slot) rather than to a renderer.

  • Audio: 16-bit linear PCM, A-law (G.711a, PCMA), and μ-law (G.711u, PCMU). Other compressed audio codings (Opus, AMR, etc.) in an AVI file will be rejected by the reader with an “Unsupported audio stream” warning.

Build flags:

  • CMake: PJMEDIA_WITH_VIDEODEV_DSHOW, PJMEDIA_WITH_VIDEODEV_V4L2, PJMEDIA_WITH_VIDEODEV_FFMPEG, PJMEDIA_WITH_VIDEODEV_AVI. AVFoundation (Apple) and Android Camera2 are auto-enabled by platform.

  • GNU autoconf: --disable-v4l2 to drop V4L2; FFmpeg capture rides on FFmpeg detection (see --with-ffmpeg); DirectShow on Mingw is enabled with --enable-video=yes. AVFoundation and Android Camera2 are auto-enabled when building for Apple/Android targets.

  • Visual Studio: set the corresponding macros in config_site.h, e.g. PJMEDIA_VIDEO_DEV_HAS_DSHOW=1.

Renderer devices

Backend

Windows

macOS

Linux

iOS

Android

OpenGL / OpenGL ES

Metal

SDL

UIView (iOS)

An additional renderer-side option exists at the port level (not as a registered video device), available on every platform when video is enabled:

  • AVI writer (pjmedia_avi_writer_create_streams()) — writes the incoming video (and optionally audio) frames to an AVI file. Useful for recording a call or a local preview to disk.

    The writer does not encode — it copies whatever bytes its put_frame() receives straight into the file. In a typical PJSIP pipeline that means uncompressed video (raw I420 from a call’s decoder, or whatever raw format the source port emits) and 16-bit linear PCM audio only. Feeding the writer G.711 (PCMA / PCMU) audio is not supported and will produce a file that doesn’t play back correctly.

Build flags:

  • CMake: PJMEDIA_WITH_VIDEODEV_OPENGL, PJMEDIA_WITH_VIDEODEV_SDL, PJMEDIA_WITH_VIDEODEV_METAL.

  • GNU autoconf: SDL is auto-detected (override with --with-sdl=DIR, disable with --disable-sdl). OpenGL/OpenGL ES, Metal, and UIView are auto-enabled for Apple/Android targets when their frameworks are detected.

  • Visual Studio: set PJMEDIA_VIDEO_DEV_HAS_OPENGL=1 / PJMEDIA_VIDEO_DEV_HAS_SDL=1 in config_site.h.

Video codecs

Backend

Windows

macOS

Linux

iOS

Android

OpenH264 (H.264)

libvpx (VP8, VP9)

FFmpeg (H.261/263/263P, H.264, MJPEG, VP8, VP9)

Android MediaCodec (H.264, VP8, VP9 — native/HW)

VideoToolbox (H.264 — native/HW)

There is no built-in default video codec; at least one of the above must be present (and detected) for video calls to negotiate a codec.

Build flags:

  • CMake: PJMEDIA_WITH_OPEN_H264_CODEC, PJMEDIA_WITH_VPX_CODEC, PJMEDIA_WITH_FFMPEG, PJMEDIA_WITH_ANDROID_MEDIACODEC_CODEC.

  • GNU autoconf: --with-openh264=DIR / --disable-openh264, --with-vpx=DIR / --disable-vpx, --with-ffmpeg / --disable-ffmpeg. Android MediaCodec is auto-enabled when building for Android targets. VideoToolbox is not auto-enabled on Apple targets — set PJMEDIA_HAS_VID_TOOLBOX_CODEC=1 in config_site.h to enable it.

  • Visual Studio / config_site.h: set PJMEDIA_HAS_OPENH264_CODEC, PJMEDIA_HAS_VPX_CODEC, PJMEDIA_HAS_FFMPEG, or (on Apple targets) PJMEDIA_HAS_VID_TOOLBOX_CODEC in config_site.h, and link the matching libraries.

Format converter

The format converter handles colour-space conversion (e.g. NV12 → I420) and scaling between the format produced by the capture device, the format expected by the codec, and the format accepted by the renderer.

Two backends are available:

Backend

Windows

macOS

Linux

iOS

Android

libyuv (bundled, default)

libswscale (from FFmpeg)

libyuv is shipped with PJPROJECT in third_party/yuv and is built and enabled by default on every platform when video is enabled. libswscale is registered in addition to libyuv when FFmpeg is enabled, and it acts as a fallback for format/size combinations that libyuv does not support.

Build flags:

  • CMake: PJMEDIA_WITH_LIBYUV (default ON); PJMEDIA_WITH_FFMPEG_SWSCALE (rides on PJMEDIA_WITH_FFMPEG).

  • GNU autoconf: --disable-libyuv to drop libyuv; --with-external-libyuv to use a system-installed libyuv instead of the bundled one. libswscale rides on FFmpeg detection.

  • Visual Studio: PJMEDIA_HAS_LIBYUV is set automatically when the bundled libyuv project is included; PJMEDIA_HAS_LIBSWSCALE rides on PJMEDIA_HAS_FFMPEG.

Media events

PJMEDIA emits asynchronous events through pjmedia_event (see pjmedia/event.h). A video application typically needs to subscribe and handle the following:

  • PJMEDIA_EVENT_FMT_CHANGED — the negotiated stream format has changed (commonly a resolution change after re-INVITE or a peer-side codec reconfiguration). The application must reconfigure the renderer to the new size; otherwise the output will be wrong or blank.

  • PJMEDIA_EVENT_KEYFRAME_FOUND / PJMEDIA_EVENT_KEYFRAME_MISSING — a keyframe was decoded, or the decoder cannot proceed because a keyframe is missing. Apps may use these to update UI state or to trigger an explicit keyframe request to the peer (FIR/PLI).

  • PJMEDIA_EVENT_ORIENT_CHANGED — the capture device’s physical orientation changed. The app should signal the new orientation to the remote peer; the capture device handles its own rotation locally.

  • PJMEDIA_EVENT_WND_CLOSING / PJMEDIA_EVENT_WND_CLOSED / PJMEDIA_EVENT_WND_RESIZED — the renderer’s window was closed or resized by the user. Apps typically tear down or reconfigure the call’s video on these.

  • PJMEDIA_EVENT_MOUSE_BTN_DOWN — the user clicked inside the video window. Available where the renderer surfaces it (e.g. SDL).

  • PJMEDIA_EVENT_RX_RTCP_FB — incoming RTCP feedback (e.g. PLI/FIR) was received. Apps that drive their own keyframe-on-demand logic can hook this.

  • PJMEDIA_EVENT_VID_DEV_ERROR — a video device stopped because of an error (e.g. camera unplugged, permission revoked). The app should surface the error and recover.

Where to look next