System Requirements#

This page provides detailed hardware, software, platform requirements, and supported models to help you set up and run the application efficiently.

Software and Hardware Requirements#

OS: Windows 11
Recommended processor: Intel® Core Ultra Series 1, 2, and 3 Processors (with integrated GPU support)
Memory: 32 GB RAM (minimum recommended)
Storage: At least 50 GB free (for models and logs)
GPU/Accelerator: Intel® iGPU (Core Ultra Series 1, Arc GPU, or higher) for summarization acceleration
NPU: Intel® NPU (Core Ultra Series 1 or higher) for Video pipelines
NPU Driver: Please download and install the latest version from Intel NPU Driver Download Page
Python: 3.12
Node.js: v18+ (for frontend)

Whisper (all models supported)
- Recommended: whisper-small or lower for CPU efficiency
- Runs on CPU (Whisper is CPU-centric)
FunASR (Paraformer)
- Recommended for Chinese transcription (paraformer-zh)
Supports transcription of .mp3/.wav audio files up to 45 minutes long.

Qwen Models (OpenVINO / IPEX)
- Qwen2.0-7B-Instruct
- Qwen2.5-7B-Instruct
Summarization supports up to 7,500 tokens (≈ 45 minutes of audio) on GPU
Run summarization on GPU (Intel® iGPU / Arc GPU) for faster performance.

Embedding Model: BAAI/bge-large-en-v1.5 for semantic topic indexing and search
Vector Store: FAISS (IndexFlatIP with cosine similarity)
Content segmentation uses the same LLM as summarization (e.g., Qwen2.5-7B-Instruct)

int8 → Recommended for lower-end CPUs (fast + efficient)
fp16 → Recommended for higher-end systems (better accuracy, GPU acceleration)
int4 → Supported, but may reduce accuracy (use only if memory-constrained)

For pipeline architecture and processing stages, see How It Works.

Model	Format	Used In	Purpose
YOLOv8m-pose	OpenVINO IR	Front pipeline	Person detection + 17-keypoint pose estimation
YOLOv8s-pose	OpenVINO IR	Back pipeline	Lightweight person detection + pose estimation
ResNet-18	OpenVINO IR	Front, Back, Content	Activity/action classification
MobileNet-V2	OpenVINO IR	Front pipeline	Lightweight classification
Person-ReID-retail-0288	OpenVINO IR	Front pipeline	Person re-identification and tracking

All models run in OpenVINO Intermediate Representation (IR) format
Inference supported on CPU, GPU, and NPU (configurable per pipeline)
Default inference device: NPU (recommended for best performance on Intel® Core Ultra)

Model	Purpose	Device
Qwen2.5-VL-3B-Instruct	Vision Language Model for video summarization	GPU
xlm-roberta-base-ViT-B-32 (CLIP)	Visual embedding for images and video frames	CPU
BAAI/bge-small-en-v1.5	Text embedding for document chunks	CPU
BAAI/bge-reranker-large	Cross-encoder reranking for search results	GPU

Category	Extensions
Video	`.mp4`
Document	`.txt`, `.pdf`, `.docx`, `.doc`, `.pptx`, `.ppt`, `.xlsx`, `.xls`, `.html`, `.htm`, `.xml`, `.md`
Image	`.jpg`, `.jpeg`, `.png`