# Metro Vision AI SDK - Tutorial 2 This tutorial demonstrates advanced video processing capabilities using Intel's hardware-accelerated video decoding and composition. You'll learn to decode multiple video streams simultaneously and display them in a tiled layout on a 4K monitor using VAAPI (Video Acceleration API) and GStreamer. ## Overview Multi-stream video processing is essential for applications like video surveillance, broadcasting, and media production. This tutorial showcases how the **Intel® integrated GPU (iGPU)** can efficiently decode and composite 16 simultaneous video streams into a single 4K display output, demonstrating the power of Intel® Quick Sync Video technology. The entire media pipeline — decode, scale, compose, and display — runs on the iGPU. > **Recommended Device: Integrated GPU (iGPU)** > > Media pipelines achieve the best throughput and lowest latency when running on the Intel® integrated GPU. The iGPU provides dedicated hardware-accelerated media decode/encode engines and parallel compute units purpose-built for real-time video processing. **CPU and NPU are not recommended** for media-intensive pipelines like multi-stream video decode and composition. > **Platform Compatibility** > This tutorial requires Intel® Core™ or Intel® Core™ Ultra processors with integrated graphics. Intel® Xeon® processors without integrated graphics are not supported for this specific use case. ## Time to Complete **Estimated Duration:** 15-20 minutes ## Learning Objectives Upon completion of this tutorial, you will be able to: - Configure hardware-accelerated video decoding with VAAPI - Create complex GStreamer pipelines for multi-stream processing - Implement tiled video composition for 4K display output - Monitor video decoding performance and frame rates - Understand Intel® Quick Sync Video acceleration benefits - Deploy containerized video processing applications ## Prerequisites Before starting this tutorial, ensure you have: - Metro Vision AI SDK installed and configured - Intel® Core™ or Intel® Core™ Ultra processor with integrated graphics - 4K monitor or display capable of 3840x2160 resolution - Docker installed and running on your system - X11 display server configured - Basic familiarity with GStreamer concepts ## System Requirements - **Operating System:** Ubuntu 22.04 LTS or Ubuntu 24.04 LTS (Desktop edition required) - **Processor:** Intel® Core™ or Intel® Core™ Ultra with integrated graphics - **Memory:** Minimum 8GB RAM (16GB recommended for smooth performance) - **Display:** 4K monitor (3840x2160) or compatible display - **Storage:** 2GB free disk space for video files - **Graphics:** Intel® integrated GPU (iGPU) with VAAPI support — **required** (this pipeline cannot run on CPU or NPU) **Important Display Requirements** This tutorial requires **Ubuntu Desktop** with a **local physical display** and active graphical session. It will **not work properly** with: - Ubuntu Server (no GUI) - Remote SSH sessions (even with X11 forwarding) - Remote Desktop/VNC connections - Headless systems **Why Remote Connections Don't Work:** Streaming 16 simultaneous 4K video streams requires extremely high bandwidth (~150-200 Mbps) and low latency. Remote desktop protocols (SSH/X11, VNC, RDP) compress video heavily and introduce significant latency, resulting in: - Severe frame drops and stuttering - Poor visual quality due to compression artifacts - Inability to accurately measure hardware acceleration performance - Network congestion and timeouts **You must be physically logged into a local desktop session with a directly connected monitor** to experience proper performance and validate hardware acceleration capabilities. ## Tutorial Steps ### Step 1: Verify Intel Integrated GPU Availability Before proceeding, verify that your system has an Intel integrated GPU and that VAAPI support is properly configured: ```bash # Check for Intel GPU device lspci | grep -i "VGA.*Intel" # Expected output should show Intel graphics, for example: # 00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] # Verify VAAPI device availability ls -la /dev/dri/ # Expected output should show renderD128 (or similar): # drwxr-xr-x 3 root root 100 Dec 2 10:00 . # drwxr-xr-x 20 root root 4420 Dec 2 10:00 .. # drwxr-xr-x 2 root root 80 Dec 2 10:00 by-path # crw-rw---- 1 root video 226, 0 Dec 2 10:00 card0 # crw-rw---- 1 root render 226, 128 Dec 2 10:00 renderD128 # Check VAAPI driver information vainfo # Expected output should show Intel iHD or i965 driver with supported profiles ``` **Troubleshooting:** - If `lspci` shows no Intel graphics, this tutorial cannot proceed on your system - If `/dev/dri/renderD128` is missing, install drivers: `sudo apt install intel-media-va-driver-non-free` - If `vainfo` command is not found: `sudo apt install vainfo` - Ensure your user is in the `video` and `render` groups: `sudo usermod -aG video,render $USER` (requires logout/login) ### Step 2: Create Working Directory and Download Video Content Create a dedicated workspace and download the sample video for multi-stream processing: ```bash # Create working directory structure mkdir -p ~/metro/metro-vision-tutorial-2/videos/ cd ~/metro/metro-vision-tutorial-2 # Download Big Buck Bunny sample video (Creative Commons licensed) wget -O videos/Big_Buck_Bunny.mp4 "https://archive.org/download/BigBuckBunny_124/Content/big_buck_bunny_720p_surround.mp4" ``` ### Step 3: Create Multi-Stream Video Processing Script Create a GStreamer pipeline script that will decode and compose 16 video streams into a 4x4 tiled display: ```bash # Create the decode script cat > decode.sh << 'EOF' #!/bin/bash # Video input file path VIDEO_IN=videos/Big_Buck_Bunny.mp4 # Verify video file exists if [ ! -f "$VIDEO_IN" ]; then echo "Error: Video file $VIDEO_IN not found!" exit 1 fi echo "Starting 4x4 tiled video decode pipeline..." echo "Video source: $VIDEO_IN" echo "Target resolution: 3840x2160 (4K)" echo "Individual tile size: 960x540" # GStreamer pipeline for 4x4 tiled video composition gst-launch-1.0 \ vacompositor name=comp0 \ sink_1::xpos=0 sink_1::ypos=0 sink_1::alpha=1 \ sink_2::xpos=960 sink_2::ypos=0 sink_2::alpha=1 \ sink_3::xpos=1920 sink_3::ypos=0 sink_3::alpha=1 \ sink_4::xpos=2880 sink_4::ypos=0 sink_4::alpha=1 \ sink_5::xpos=0 sink_5::ypos=540 sink_5::alpha=1 \ sink_6::xpos=960 sink_6::ypos=540 sink_6::alpha=1 \ sink_7::xpos=1920 sink_7::ypos=540 sink_7::alpha=1 \ sink_8::xpos=2880 sink_8::ypos=540 sink_8::alpha=1 \ sink_9::xpos=0 sink_9::ypos=1080 sink_9::alpha=1 \ sink_10::xpos=960 sink_10::ypos=1080 sink_10::alpha=1 \ sink_11::xpos=1920 sink_11::ypos=1080 sink_11::alpha=1 \ sink_12::xpos=2880 sink_12::ypos=1080 sink_12::alpha=1 \ sink_13::xpos=0 sink_13::ypos=1620 sink_13::alpha=1 \ sink_14::xpos=960 sink_14::ypos=1620 sink_14::alpha=1 \ sink_15::xpos=1920 sink_15::ypos=1620 sink_15::alpha=1 \ sink_16::xpos=2880 sink_16::ypos=1620 sink_16::alpha=1 \ ! vapostproc ! xvimagesink display=$DISPLAY sync=false \ \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_1 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_2 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_3 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_4 \ \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_5 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_6 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_7 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_8 \ \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_9 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_10 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_11 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_12 \ \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_13 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_14 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_15 \ filesrc location=${VIDEO_IN} ! qtdemux ! vah264dec ! gvafpscounter ! vapostproc scale-method=fast ! video/x-raw,width=960,height=540 ! comp0.sink_16 EOF ``` ### Understanding the GStreamer Pipeline The script creates a complex pipeline with these key components: **Pipeline Architecture:** - **Input Sources**: 16 identical video file streams - **Decoder**: `vah264dec` - Hardware-accelerated H.264 decoding using VAAPI - **Scaling**: `vapostproc` - Hardware-accelerated video post-processing and scaling - **Composition**: `vacompositor` - Hardware-accelerated video composition - **Output**: `xvimagesink` - X11-based video display **Tiled Layout Configuration:** ```text ┌─────────┬─────────┬─────────┬─────────┐ │ Stream1 │ Stream2 │ Stream3 │ Stream4 │ ← Row 1 (y=0) │ 0,0 │ 960,0 │1920,0 │2880,0 │ ├─────────┼─────────┼─────────┼─────────┤ │ Stream5 │ Stream6 │ Stream7 │ Stream8 │ ← Row 2 (y=540) │ 0,540 │ 960,540 │1920,540 │2880,540 │ ├─────────┼─────────┼─────────┼─────────┤ │ Stream9 │Stream10 │Stream11 │Stream12 │ ← Row 3 (y=1080) │ 0,1080 │960,1080 │1920,1080│2880,1080│ ├─────────┼─────────┼─────────┼─────────┤ │Stream13 │Stream14 │Stream15 │Stream16 │ ← Row 4 (y=1620) │ 0,1620 │960,1620 │1920,1620│2880,1620│ └─────────┴─────────┴─────────┴─────────┘ ``` **Performance Optimizations:** - **VAAPI Acceleration**: Hardware-accelerated decoding, scaling, and composition - **Fast Scaling**: `scale-method=fast` for optimal performance - **Async Display**: `sync=false` to prevent frame dropping ### Step 4: Prepare Environment and Permissions Configure the execution environment for the containerized video processing: ```bash # Make the script executable chmod +x decode.sh # Enable X11 forwarding for Docker containers xhost +local:docker # Verify GPU device availability ls -la /dev/dri/ ``` ### Step 5: Execute Multi-Stream Video Processing Launch the containerized multi-stream decode and composition pipeline. The `--device /dev/dri` flag gives the container access to the Intel® integrated GPU, which handles all decode, scaling, and composition in hardware: > **Running on iGPU:** Every stage of this pipeline — H.264 decode (`vah264dec`), scaling (`vapostproc`), and composition (`vacompositor`) — executes on the integrated GPU via VAAPI. The CPU only orchestrates the pipeline; all heavy media processing is offloaded to the iGPU. ```bash # Set up GPU device access export DEVICE=/dev/dri/renderD128 export DEVICE_GRP=$(ls -g $DEVICE | awk '{print $3}' | xargs getent group | awk -F: '{print $3}') # Execute the multi-stream video processing docker run -it --rm --net=host \ -e no_proxy=$no_proxy \ -e https_proxy=$https_proxy \ -e socks_proxy=$socks_proxy \ -e http_proxy=$http_proxy \ -v /tmp/.X11-unix:/tmp/.X11-unix:rw \ --device /dev/dri --group-add ${DEVICE_GRP} \ -e DISPLAY=$DISPLAY --ipc=host \ -v $HOME/.Xauthority:/home/dlstreamer/.Xauthority:ro \ -v $PWD/videos:/home/dlstreamer/videos:ro \ -v $PWD/decode.sh:/home/dlstreamer/decode.sh:ro \ intel/dlstreamer:2025.1.2-ubuntu24 \ /home/dlstreamer/decode.sh ``` ### Step 6: Monitor Performance and Results The application will display a 4x4 tiled video composition on your 4K monitor. You should see: ![4x4 Video Streaming Result](images/intel-edge-ai-box-4x4-video-streaming.png) **Performance Monitoring:** Monitor system resources during playback: ```bash # In a separate terminal, monitor GPU utilization sudo intel_gpu_top ``` ```bash # Monitor CPU and memory usage htop ``` ### Step 7: Stop the Application To stop the video processing pipeline: ```bash # Press Ctrl+C in the terminal running the Docker container # Or use Docker commands to stop docker ps # Find the container ID docker stop ``` Clean up the environment: ```bash # Restore X11 security (optional) xhost -local:docker # Clean up any temporary files docker system prune -f ``` ## Understanding the Technology ### Intel® Quick Sync Video Technology This tutorial leverages the Intel® integrated GPU's hardware-accelerated video processing capabilities. Media pipelines like this one are best run on the iGPU — **not on CPU or NPU** — because the iGPU contains dedicated fixed-function media engines designed specifically for video decode, encode, and processing. **Why iGPU for Media Pipelines?** - **Dedicated Video Engines**: The iGPU contains separate silicon (multi-format codec engines) for video decode/encode operations that far exceed CPU software decode performance - **CPU Offloading**: Running the media pipeline on the iGPU frees CPU cores for other computational tasks such as application logic or AI post-processing - **Power Efficiency**: Hardware media engines consume significantly less power than CPU-based software decoding - **Parallel Processing**: Multiple decode engines on the iGPU can process many streams simultaneously - **Not suitable for CPU/NPU**: CPU software decode lacks the throughput for multi-stream real-time 4K composition; NPU is designed for AI inference workloads, not media decode/encode ### VAAPI Integration **Video Acceleration API (VAAPI)** provides: - **Hardware Abstraction**: Unified interface across Intel graphics generations - **Pipeline Optimization**: Direct GPU memory access without CPU copies - **Format Support**: Hardware acceleration for H.264, H.265, VP9, and AV1 codecs - **Scaling Operations**: Hardware-accelerated resize and format conversion ### GStreamer Pipeline Architecture The tutorial demonstrates advanced GStreamer concepts: **Element Types:** - **Source Elements**: `filesrc` - File input - **Demuxer Elements**: `qtdemux` - Container format parsing - **Decoder Elements**: `vah264dec` - Hardware-accelerated decoding - **Transform Elements**: `vapostproc` - Hardware scaling and format conversion - **Compositor Elements**: `vacompositor` - Multi-stream composition - **Sink Elements**: `xvimagesink` - Display output **Pipeline Benefits:** - **Zero-Copy Operations**: Direct GPU memory transfers - **Parallel Processing**: Concurrent decode of multiple streams - **Dynamic Reconfiguration**: Runtime pipeline modifications - **Error Recovery**: Robust handling of stream issues