System Requirements#

This section lists the hardware, software, and network requirements for running the Store-wide Loss Prevention application.

Host Operating System#

  • Ubuntu 22.04 LTS (recommended and validated).

  • Other recent 64-bit Linux distributions may work, but are not fully validated.

Hardware Requirements#

  • CPU:

    • 8 physical cores (16 threads) or more recommended.

    • x86_64 architecture with support for AVX2.

  • System Memory (RAM):

    • Minimum: 16 GB.

    • Recommended: 32 GB or more for smoother multi-service operation and headroom for the VLM.

  • Storage:

    • Minimum free disk space: 30 GB.

    • Recommended: 60 GB+ to accommodate Docker images, OpenVINO™ models, the VLM weights (Qwen2.5-VL is several GB), sample video, and frame storage for behavioral analysis.

  • Graphics / Accelerators:

    • Required: Intel CPU.

    • Optional (recommended for full experience):

      • Intel integrated or discrete GPU supported by Intel® Graphics Compute Runtime — used for person detection, re-identification, pose estimation, and VLM inference.

      • Intel NPU supported by the linux-npu-driver stack — recommended for VLM inference (see Release Notes for a known issue on systems without NPU).

    • The host must expose GPU and NPU devices to Docker, for example:

      • /dev/dri (GPU)

      • /dev/accel/accel0 (NPU)

    • Cameras: at least one RTSP source or a sample video file replayed via the bundled lp-video container.

Software Requirements#

  • Docker and Container Runtime:

    • Docker Engine 24.x or newer.

    • Docker Compose v2 (integrated as docker compose) or compatible compose plugin.

    • Ability to run containers with:

      • Device mappings for GPU/NPU (for the swlp-service, behavioral-analysis, and DL Streamer pipeline server).

      • Bind mounts for sample video and generated TLS certificates.

  • Python (for helper scripts and tools):

    • Python 3.10 or newer recommended.

    • Used primarily for asset preparation scripts (download_models) and local tooling; application containers include their own Python runtimes.

  • Git and Make:

    • git for cloning the repository.

    • make to run provided automation targets (for example, make demo, make download-models, make clean).

AI Models and Workloads#

The application bundles several AI workloads, each with its own models and inputs or outputs:

  • Person Detection (SceneScape DL Streamer pipeline):

    • Model: person-detection-retail-0013 from Open Model Zoo, converted to OpenVINO IR.

    • Input: Camera frames (BGR) from the RTSP source or replayed video.

    • Output: Per-frame bounding boxes used by SceneScape’s tracker.

    • Target devices: Intel CPU or GPU via OpenVINO (DETECTION_DEVICE).

  • Person Re-Identification (SceneScape DL Streamer pipeline):

    • Model: person-reidentification-retail-0277 from Open Model Zoo, converted to OpenVINO IR.

    • Input: Cropped person patches from the detector.

    • Output: Embedding vectors used by SceneScape’s controller to assign persistent object_id across cameras and time.

    • Target devices: Intel CPU or GPU via OpenVINO (REID_DEVICE).

  • Pose Estimation (Behavioral Analysis pre-filter):

    • Model: YOLO pose model, converted to OpenVINO IR (/models/yolo_models/).

    • Input: Cropped person frames from a HIGH_VALUE-zone visit.

    • Output: 2D keypoints used to detect hand-near-body or pocket-region interactions; non-suspicious frames short-circuit and emit a no_match result without invoking the VLM.

    • Target devices: Intel CPU or GPU via OpenVINO (POSE_DEVICE).

  • VLM Concealment Confirmation (Behavioral Analysis):

    • Model: Qwen/Qwen2.5-VL-7B-Instruct Vision Language Model (/models/vlm_models/).

    • Input: A small batch of cropped person frames flagged as candidates by the pose pre-filter, with a structured prompt.

    • Output: A natural-language justification, a status (suspicious, no_match, or no_enough_data), a confidence, and a last_frame_ts. Published on the ba/results MQTT topic.

    • Target devices: Intel CPU, GPU, or NPU via OpenVINO (VLM_DEVICE). NPU is recommended where available.

Network and Proxy#

  • Network Access:

    • Local network connectivity to access the LP REST API (http://<HOST_IP>:8082), the Gradio dashboard (http://<HOST_IP>:7860), and the SceneScape UI (https://<HOST_IP>).

    • Optional outbound internet access to download Docker base images, OpenVINO models, and Qwen2.5-VL weights (if not pre-cached).

  • Proxy Support (optional):

    • If your environment uses HTTP/HTTPS proxies, configure:

      • HTTP_PROXY, HTTPS_PROXY, NO_PROXY in the shell before running make.

Permissions#

  • Ability to run Docker as a user in the docker group or with sudo.

  • Sufficient permissions to access device nodes for GPU and NPU (typically via membership in groups such as video or render, or via explicit devices configuration in Docker Compose).

Browser Requirements#

  • Modern web browser (Chrome, Edge, or Firefox) to access the Gradio dashboard and SceneScape UI.

  • JavaScript enabled.

These requirements are intended for development and evaluation environments. For any production-like deployment, you should also consider additional factors such as security hardening, monitoring, backup, retention of evidence frames in object storage, and resource isolation.