# Edge Workloads and Benchmarks Guide ## Overview The [Edge Workloads and Benchmarks](https://github.com/open-edge-platform/edge-workloads-and-benchmarks) suite validates Intel edge platform performance across four workload categories: vision AI inference, hardware-accelerated media processing, end-to-end video analytics pipelines, and generative AI. It measures throughput, latency, power consumption, and power efficiency across CPU, GPU, and NPU devices. Use this guide after provisioning an edge node with the Infrastructure Blueprint to quantify platform performance and validate hardware acceleration readiness. ## Benchmark Categories | Category | What It Measures | Devices | Backend | |----------|-----------------|---------|---------| | **Vision Benchmarks** | AI model inference (detection, classification) | CPU, GPU, NPU | OpenVINO benchmark_app | | **Media Benchmarks** | Hardware video decode throughput and stream density | GPU (VA-API) | GStreamer + VA-API | | **Edge AI Pipelines** | End-to-end video analytics (decode + detect + track + classify) | GPU, NPU, GPU+NPU | DL Streamer | | **GenAI Benchmarks** | LLM/VLM token generation (1st token latency, throughput) | CPU, GPU, NPU | OpenVINO GenAI | ## Prerequisites - [Edge Node Infrastructure Blueprint](https://github.com/open-edge-platform/edge-node-infrastructure-blueprint) deployed. - During target system installation, set `host_type=container` in the `config-file`. - Network connectivity for model and media downloads. ### Verify Hardware Readiness Confirm GPU and NPU are visible before proceeding: ```bash # GPU — should list Intel render nodes ls /dev/dri/render* # NPU — present only on supported platforms ls /dev/accel/accel* # VA-API codec support vainfo 2>/dev/null | grep -i "profile" ``` ## Setup ### Clone the Repository ```bash git clone https://github.com/open-edge-platform/edge-workloads-and-benchmarks.git cd edge-workloads-and-benchmarks ``` ### Install Prerequisites and Download Collateral GPU and NPU drivers are already installed by the Infrastructure Blueprint provisioning step, so disable their installation by passing `INCLUDE_GPU=False INCLUDE_NPU=False` to `make prereqs`. Then download the benchmark collateral and validate the environment: ```bash make prereqs INCLUDE_GPU=False INCLUDE_NPU=False make collateral INCLUDE_GENAI=True make check ``` The `make collateral` step downloads AI models and media files. Key variables: | Variable | Default | Description | |----------|---------|-------------| | `INCLUDE_GPU` | `True` | Install GPU compute drivers | | `INCLUDE_NPU` | `True` | Install NPU drivers | | `INCLUDE_VISION` | `True` | Download vision models (YOLOv11, ResNet-50, MobileNet-v2) | | `INCLUDE_MEDIA` | `True` | Download and encode media files (H.264, H.265, 1080p, 4K) | | `INCLUDE_GENAI` | `False` | Download GenAI models (requires Hugging Face token) | > **Note:** GenAI models require significant storage for original Hugging Face weights plus INT8/INT4 quantized artifacts. After quantization, reclaim space by removing `~/.cache/huggingface/hub/` and temporary venvs in `tools/genai-downloader/`. ### Hugging Face Token (GenAI only) Some GenAI models require authentication: ```bash export HF_TOKEN= ``` ## Running Benchmarks ### Vision Benchmarks Measures inference throughput (FPS), latency, and power efficiency for detection models (YOLOv11n/m, YOLOv5m) and classification models (ResNet-50, MobileNet-v2), all at INT8 precision. ```bash cd workloads/vision-benchmarks && make benchmarks cd ../.. ``` Execution modes: `tput` (maximum throughput) and `latency` (single-inference). Batch sizes: 1, 8, 16. Supports GPU+NPU concurrent mode for aggregate platform throughput. ### Media Benchmarks Measures hardware-accelerated video decode performance using VA-API across H.265 and H.264 codecs at 1080p and 4K, scaling from 1 to 8 parallel streams. ```bash cd workloads/media-benchmarks && make benchmarks cd ../.. ``` Key metrics: decode throughput (FPS), maximum stream density at 30 FPS target, power consumption. ### Edge AI Pipelines Measures end-to-end video analytics pipeline performance using DL Streamer. Each pipeline chains media decode, preprocessing, object detection, tracking, and classification over 1080p HEVC input. ```bash cd workloads/edge-ai-pipelines && make benchmarks cd ../.. ``` Three intensity levels (Light, Medium, Heavy) with increasing model complexity. Device placement modes: GPU-only, NPU-only, GPU+NPU split, and GPU+NPU concurrent. ### GenAI Benchmarks Measures generative AI inference for LLMs (Llama 3.2 3B, DeepSeek-R1-1.5B, Mistral 7B) and VLMs (Phi-4 Multimodal, Gemma 3 4B, MiniCPM-V 2.6) at INT8_ASYM and INT4_SYM_CW precisions. ```bash cd workloads/genai-benchmarks && make benchmarks cd ../.. ``` Key metrics: 1st token latency (ms), 2nd token throughput (tokens/s), power consumption (W), and power efficiency (tokens/s/W). ## Benchmark Execution Options Common parameters available across all workload categories: | Parameter | Description | |-----------|-------------| | `DRY_RUN=True` | List all test configurations without executing | | `RESUME=True` | Skip tests that already have results | | `DURATION=` | Set test duration (default: 60-120s) | | `POWER=True` | Enable power measurement (requires sudo) | | `CORES=pcore` | Pin execution to performance cores | | `CORES=ecore` | Pin execution to efficiency cores | | `CORES=0-11` | Pin to specific core range | | `CLEAR=True` | Remove previous results before running | Example — dry run to preview vision test matrix: ```bash cd workloads/vision-benchmarks && make benchmarks DRY_RUN=True ``` Example — run media benchmarks with power measurement, resuming from prior results: ```bash cd workloads/media-benchmarks && make benchmarks POWER=True RESUME=True ``` ## Generating Reports After running benchmarks, generate an interactive HTML dashboard: ```bash make report make serve ``` The report is accessible at `http://localhost:8000` and includes per-model throughput and latency charts, device comparisons, power efficiency rankings, and stream density results. Check which configurations have completed: ```bash make status ``` Results are stored under each workload directory at `workloads//results/` in JSON format, organized by model, device, mode, and batch size. ## Cleanup ```bash make clean-results # Remove benchmark results only make clean-all # Remove all generated content (models, media, results) ``` ## Troubleshooting | Problem | Solution | |---------|----------| | `make check` reports missing GPU | Verify drivers: `sudo apt install intel-opencl-icd intel-media-va-driver-non-free` | | NPU not detected | Check kernel module: `lsmod \| grep intel_vpu` and device nodes: `ls /dev/accel/` | | GenAI download fails | Verify `HF_TOKEN` is set and has access to gated models | | Low GPU throughput | Ensure no other workloads are using the GPU; check `intel_gpu_top` | | Power measurement fails | `POWER=True` requires sudo access for RAPL/hwmon readings | | Docker permission denied | Add user to docker group: `sudo usermod -aG docker $USER` and re-login | | Insufficient storage | Run without GenAI (`INCLUDE_GENAI=False`) or remove HF cache after conversion | ## References - [Edge Workloads and Benchmarks Repository](https://github.com/open-edge-platform/edge-workloads-and-benchmarks) - [OpenVINO Toolkit](https://docs.openvino.ai/) - [DL Streamer Documentation](https://github.com/open-edge-platform/dlstreamer) - [Container Device Interface Guide](container-device-interface-guide.md) — CDI setup for GPU/NPU access - [DL Streamer Pipelines Guide](dlstreamer-pipelines-guide.md) — Building custom pipelines - [Platform Capabilities](platform-capabilities.md) — Hardware and software stack details