DL Streamer Coding Agent — User Guide#

Overview#

The DL Streamer Coding Agent is a Claude Code skill that turns plain-English descriptions into working video-analytics applications. You describe what you want — it generates, builds, and validates a complete DL Streamer app.

Skill location: <dlstreamer-repo>/.github/skills/dlstreamer-coding-agent/ (where <dlstreamer-repo> is your local clone of the DL Streamer repository)

Quick Start#

Open Claude Code inside your local dlstreamer repository
Describe the video AI pipeline you want to build
The agent asks clarifying questions if needed, then generates and runs the app

That’s it. No boilerplate, no manual GStreamer wiring.

What to Include in Your Prompt#

A good prompt answers these questions:

What	Why	Example
Video source	Where to read frames from	A Pexels URL, local file, or `rtsp://...`
AI model(s)	What intelligence to apply	“YOLOv11 for detection”
What to output	What the app should produce	“Annotated video + JSON with detections”
Target hardware	Which accelerator to use	“Intel Core Ultra 3, prefer GPU”
App language	Python, C++, or shell script	“Python application”
Where to save	Output directory name	“Save in `my_app/`”

If you skip any of these, the agent will ask before proceeding.

Example Prompts#

Simple — detection + tracking (shell script):

Create a bash script that detects and tracks people using YOLO26m and Mars-Small-128.
Input: https://videos.pexels.com/video-files/18552655/18552655-hd_1280_720_30fps.mp4
Output: annotated video file.
Optimized for Intel Core Ultra 3. Save in people_tracking/.

Medium — license plate OCR (Python):

Build a Python app for license plate recognition:
- YOLOv11 for plate detection, PaddleOCR for text
- Input: video file or RTSP camera
- Output: annotated video + JSON with plate text
Save in license_plate_recognition/. Include README.

Advanced — event-based recording:

Python app that records video clips only when people are detected:
- Input: RTSP camera (or file for testing)
- Detect people, start recording on detection, stop when gone
- Output: sequence of clips (save-1.mp4, save-2.mp4, ...)
Save in smart_nvr/.

Conversion — DeepStream to DL Streamer:

Convert this DeepStream app to DL Streamer: [paste code or path]
Keep the same detection + classification + JSON output.

What You Get#

The agent generates a ready-to-run project:

my_app/
├── my_app.py            # Main application (or .sh / .cpp)
├── export_models.py     # Downloads and converts AI models
├── requirements.txt     # Python dependencies
├── README.md            # How to set up and run
└── results/             # Output goes here at runtime

It also:

Pulls the intel/dlstreamer:latest Docker image
Downloads and converts models to OpenVINO format
Downloads your test video
Runs the app and checks that output is valid

Supported Use Cases#

Category	Examples
Detection	YOLO (v8/v11/v26), SSD, RTDETR
Tracking	DeepSORT, SORT with re-ID models
Text/OCR	PaddleOCR for license plates, signs
GenAI/VLM	InternVL, MiniCPM, Qwen2.5-VL, SmolVLM
Multi-camera	Shared models, cross-stream batching
Mosaic	Composite 2x2 / 3x3 grid views
Smart recording	Event-triggered start/stop clips
Streaming	WebRTC output, RTSP input
Conversion	DeepStream Python/C++ → DL Streamer

Tips#

Name exact models — “YOLOv11n” works better than “an object detector”
Provide a test video — the agent validates the pipeline end-to-end
Say “run and check output” — triggers automatic validation
Ask for README — ensures you get setup docs with the code

Troubleshooting#

Problem	Solution
Docker pull fails	Check network and `docker login`
Model export runs out of memory	Use a smaller model variant
Output video won’t play	Usually fixed by agent (EOS handling); re-run if needed
Very slow first run (5-10 min)	Normal — GPU compiles shaders on first inference
NPU inference fails	Not all models support NPU; agent falls back to GPU

More Examples#

The skill includes additional example prompts at: <dlstreamer-repo>/.github/skills/dlstreamer-coding-agent/examples/

People detection + tracking
License plate recognition
Event-based smart NVR
Multi-stream mosaic
Pose estimation
Safety compliance checks
DeepStream conversion (Python and C++)

Prerequisites#

Docker installed and running
Python 3.10+
Network access (Docker images, model downloads, test videos)
Intel hardware with GPU recommended