# System Requirements This page provides detailed hardware, software, platform requirements, and supported models to help you set up and run the application efficiently. ## Software and Hardware Requirements - **OS**: Windows 11 - **Recommended processor**: Intel® Core Ultra Series 1, 2, and 3 Processors (with integrated GPU support) - **Memory**: 32 GB RAM (minimum recommended) - **Storage**: At least 50 GB free (for models and logs) - **GPU/Accelerator**: Intel® iGPU (Core Ultra Series 1, Arc GPU, or higher) for summarization acceleration - **NPU**: Intel® NPU (Core Ultra Series 1 or higher) for Video pipelines - **NPU Driver**: Please download and install the latest version from [Intel NPU Driver Download Page](https://www.intel.com/content/www/us/en/download/794734/intel-npu-driver-windows.html) - **Python**: 3.12 - **Node.js**: v18+ (for frontend) ## Audio Pipeline Supported Models ### ASR (Automatic Speech Recognition) - **Whisper (all models supported)** - Recommended: `whisper-small` or lower for CPU efficiency - Runs on **CPU** (Whisper is CPU-centric) - **FunASR (Paraformer)** - Recommended for **Chinese transcription** (`paraformer-zh`) - Supports transcription of .mp3/.wav audio files up to 45 minutes long. ### Summarization (LLMs) - **Qwen Models (OpenVINO / IPEX)** - `Qwen2.0-7B-Instruct` - `Qwen2.5-7B-Instruct` - Summarization supports up to 7,500 tokens (≈ 45 minutes of audio) on GPU - Run summarization on **GPU** (Intel® iGPU / Arc GPU) for faster performance. ### Content Segmentation and Topic Search - **Embedding Model**: `BAAI/bge-large-en-v1.5` for semantic topic indexing and search - **Vector Store**: FAISS (IndexFlatIP with cosine similarity) - Content segmentation uses the same LLM as summarization (e.g., Qwen2.5-7B-Instruct) ### Supported Weight Formats - **int8** → Recommended for lower-end CPUs (fast + efficient) - **fp16** → Recommended for higher-end systems (better accuracy, GPU acceleration) - **int4** → Supported, but may reduce accuracy (use only if memory-constrained) ## Video Analytics Pipeline - Supports 3 concurrent video pipelines (front, back, content) up to 45 minutes - Supports .mp4 format and RTSP streams - Outputs processed video via RTSP and HLS/WebRTC streaming (MediaMTX) For pipeline architecture and processing stages, see [How It Works](../how-it-works.md#video-analytics-pipeline). ### Supported Models | Model | Format | Used In | Purpose | | ----- | ------ | ------- | ------- | | **YOLOv8m-pose** | OpenVINO IR | Front pipeline | Person detection + 17-keypoint pose estimation | | **YOLOv8s-pose** | OpenVINO IR | Back pipeline | Lightweight person detection + pose estimation | | **ResNet-18** | OpenVINO IR | Front, Back, Content | Activity/action classification | | **MobileNet-V2** | OpenVINO IR | Front pipeline | Lightweight classification | | **Person-ReID-retail-0288** | OpenVINO IR | Front pipeline | Person re-identification and tracking | - All models run in OpenVINO Intermediate Representation (IR) format - Inference supported on **CPU**, **GPU**, and **NPU** (configurable per pipeline) - Default inference device: **NPU** (recommended for best performance on Intel® Core Ultra) ## Content Search Pipeline ### Content Search Supported Models | Model | Purpose | Device | | ----- | ------- | ------ | | **Qwen2.5-VL-3B-Instruct** | Vision Language Model for video summarization | GPU | | **xlm-roberta-base-ViT-B-32** (CLIP) | Visual embedding for images and video frames | CPU | | **BAAI/bge-small-en-v1.5** | Text embedding for document chunks | CPU | | **BAAI/bge-reranker-large** | Cross-encoder reranking for search results | GPU | ### Supported File Formats | Category | Extensions | | -------- | ---------- | | **Video** | `.mp4` | | **Document** | `.txt`, `.pdf`, `.docx`, `.doc`, `.pptx`, `.ppt`, `.xlsx`, `.xls`, `.html`, `.htm`, `.xml`, `.md` | | **Image** | `.jpg`, `.jpeg`, `.png` |