Optimize Video Encoding and Decoding#

The terms encode and decode refer to video encoding and decoding operations — i.e., converting between compressed video formats (like H.264, H.265, VP9) and raw video frames that can be used for AI inference or image processing in pipelines.

Encode and decode DLStreamer context#

Decode#

The purpose of decode phase is to convert compressed video streams (e.g., from .mp4, .h264, .rtsp, .mkv) into raw frames that can be processed by neural networks or computer vision functions.

In GStreamer/DLStreamer terms:

Performed by elements like decodebin, vaapidecodebin, or dlstreamer::vaapidecode.
The output is usually in a raw format like BGRx, NV12, or I420.
Hardware-accelerated decoding is often used (via VAAPI, oneVPL, or GPU acceleration).

Example:

filesrc location=input.mp4 ! decodebin ! videoconvert ! video/x-raw,format=BGRx ! appsink

This “decode” stage makes the video usable for inference with DLStreamer’s AI plugins (like gvadetect, gvaclassify, etc.).

Encode#

The purpose od encode phase is to take raw video frames (possibly after inference overlays, metadata, or post-processing) and compress them back into a standard video format for storage, display, or streaming.

In GStreamer/DLStreamer terms:

Performed by elements like x264enc, vaapih264enc, dlstreamer::vaapiencode.
The input is raw frames (video/x-raw), and the output is a compressed stream (video/x-h264, etc.).
Often followed by a mux element (mp4mux, matroskamux) and a file sink or network sink.

Example:

... ! videoconvert ! vaapih264enc ! mp4mux ! filesink location=output.mp4

Goals for Encode/Decode optimizations#

Maximize concurrent streams: Allow the system to decode as many video streams as possible simultaneously.
Balance latency vs. throughput:
- Low latency: Needed for real-time analytics or monitoring.
- High throughput: Needed for batch video ingestion or archival processing.
Efficient resource usage: Minimize CPU/GPU load, memory bandwidth, and unnecessary copying.

Latency vs. Throughput Trade-Off#

Focus	Optimization Strategy	Pros	Cons
Low latency	Decode frames one-by-one, minimal buffering	Fast per-frame response	GPU underutilized, lower throughput
High throughput	Batch decode multiple frames or streams, pipeline parallelism	Max hardware utilization, many streams	Increased per-frame latency (frames wait for batch)

In summary:

For real-time analytics, use small batches or per-frame decode.
For archival ingestion or high-throughput pipelines, use larger batch sizes to fully utilize hardware.