Get Started#

Win Vision AI is a Python application for running concurrent GStreamer inference pipelines on Intel hardware (CPU / GPU / NPU) on Windows 11.


Prerequisites#

Install Python and Git#

Install Python 3.12 or higher from the official Python website. Install Git for Windows from the official Git website.

Set Proxies (Optional)#

Go to the target directory of your choice, open PowerShell and run all the terminal commands below

$env:http_proxy  = # example: http://proxy.example.com:891
$env:https_proxy = # example: http://proxy.example.com:891
$env:no_proxy    = "localhost,127.0.0.1"

Install Intel DL Streamer#

Download the latest dlstreamer-<version>-win64.exe from the Intel DL Streamer releases page and follow the Windows installation guide.

Note: By default, DL Streamer installs to C:\Program Files\Intel\dlstreamer.


Set Up the Application#

Clone the Suite#

To learn more on partial cloning, check the Repository Cloning guide.

git clone --filter=blob:none --sparse --branch release-2026.1.0 https://github.com/open-edge-platform/edge-ai-suites.git
cd edge-ai-suites
git sparse-checkout set manufacturing-ai-suite
cd manufacturing-ai-suite/industrial-edge-insights-vision/win-vision-ai

Install Python Dependencies#

python -m venv venv
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
venv\Scripts\Activate.ps1
pip install -r requirements.txt

Set Environment Variables#

First, find the gstreamer-python install location:

pip show gstreamer-python

Note the Location field from the output (e.g., C:\Users\<username>\AppData\Local\Programs\Python\Python312\Lib\site-packages), then set PYTHONPATH using that path:

$env:PYTHONPATH="<gstreamer-python-location>\gstreamer_python\Lib\site-packages"
$env:PYGI_DLL_DIRS="C:\Program Files\gstreamer\1.0\msvc_x86_64\bin"

Verify GStreamer and DL Streamer plugins loaded correctly:

gst-inspect-1.0 gvadetect

Camera Input (Optional)#

To use a GenICam-compatible camera (e.g., Basler, Balluff, HikRobot), download the GenICam runtime DLLs and set the required environment variables.

The gstgencamsrc.dll plugin is pre-built and included in the bin\ folder — no build step is required. If you prefer to build the plugin from source yourself, see the src-gst-gencamsrc README (Windows).

Download GenICam Runtime DLLs#

Run this once to download the EMVA GenICam v3.1 VC120 runtime DLLs into bin\Win64_x64\:

.\src\setup_genicam_runtime.ps1
Set Camera Environment Variables#
# Path to your win-vision-ai clone root
$repoRoot = "<path-to-win-vision-ai-clone>"

# GenICam runtime DLLs (downloaded by setup_genicam_runtime.ps1 into bin\Win64_x64\)
$genicamRuntime = "$repoRoot\bin\Win64_x64"

# Add gstgencamsrc.dll plugin directory to GStreamer plugin search path
$env:GST_PLUGIN_PATH = "C:\Program Files\Intel\dlstreamer\bin;$repoRoot\bin"

# GenICam transport layer — set to your camera vendor's GenTL producer path, for example:
#   Basler pylon:           C:\Program Files\Basler\pylon\Runtime\x64
#   Balluff Impact Acquire: C:\Program Files\Balluff\ImpactAcquire\bin\x64
#   HikRobot MVS:           C:\Program Files (x86)\Common Files\MVS\Runtime\Win64_x64
$env:GENICAM_GENTL64_PATH = "C:\Program Files\Basler\pylon\Runtime\x64"

# Extend PATH with GenICam runtime DLLs (do NOT overwrite existing PATH)
$env:PATH = "$genicamRuntime;$env:PATH"

# Always clear the GStreamer plugin registry cache before testing with a new plugin
Remove-Item "C:\Temp\gst-registry-clean.bin" -ErrorAction SilentlyContinue
$env:GST_REGISTRY_1_0 = "C:\Temp\gst-registry-clean.bin"

Verify the camera plugin loaded correctly:

gst-inspect-1.0 gencamsrc

Download MediaMTX (for RTSP / WebRTC streaming)#

Required when any pipeline uses RTSP or WebRTC frame output.

Create a new directory where MediaMTX will be downloaded, then run the setup script pointing to that directory:

New-Item -ItemType Directory -Path "<mediamtx_dir>"
python src/setup_mediamtx.py --dir <mediamtx_dir> --version v1.18.1
$env:MEDIAMTX_PATH = "<mediamtx_dir>\mediamtx.exe"

Download a Model#

If you want to download YOLO models, you can refer to the DL Streamer download scripts.

pip install ultralytics
# FP32 (default)
python src/download_models.py --model yolo11n --outdir C:/Users/<username>/models
# FP16
python src/download_models.py --model yolo11n --outdir C:/Users/<username>/models --half
# INT8
python src/download_models.py --model yolo11n --outdir C:/Users/<username>/models --int8

Use the exported .xml path in config.yaml.


Configure config.yaml#

Note: The config.yaml file is located in the win-vision-ai directory of your clone (i.e., edge-ai-suites/manufacturing-ai-suite/industrial-edge-insights-vision/win-vision-ai/config.yaml).

Note: Use forward slashes in all YAML paths to avoid escape issues.

Metrics#

Controls per-pipeline FPS and latency reporting.

metrics:
  enabled: false # false = only frame count logged
  export_interval_s: 5.0
  prometheus:
    enabled: false
    port: 8000

When enabled, each pipeline logs a full stats line every interval:

state=PLAYING     fps_avg=30.6    fps_now=31.6    lat_avg=3.01 ms  frames=1047

When disabled, only the frame count is shown:

state=PLAYING     frames=121
Prometheus#

When metrics.enabled: true and metrics.prometheus.enabled: true, the app starts an HTTP server and exposes a /metrics endpoint that Prometheus can scrape.

Install the client library:

pip install prometheus_client

Enable in config:

metrics:
  enabled: true
  export_interval_s: 5.0
  prometheus:
    enabled: true
    port: 8000 # /metrics served at http://localhost:8000/metrics

Exposed gauges (all labelled by pipeline_id):

Metric

Description

pipeline_avg_fps

Rolling average FPS

pipeline_current_fps

Instantaneous FPS

pipeline_avg_latency_ms

Rolling average inference latency (ms)

pipeline_frame_count

Total frames processed

pipeline_running

1 if PLAYING, 0 otherwise

Models#

models:
  inst0:
    type: detection # detection | classification
    model: "C:/Users/path/to/model.xml"
    device: CPU # CPU | GPU | NPU
    properties:
      batch_size: 1
      threshold: 0.4

Input source#

input:
  type: file # file | rtsp | camera
  url: "C:/Users/path/to/video"

Requires installed MediaMTX. Start the RTSP servers:

input:
  type: rtsp # file | rtsp | camera
  url: "rtsp://<ip>:<port>/live.sdp"

Requires the camera environment variables from Set Environment Variables.

serial, pixel-format, width, and height are all required fields. Any additional properties are passed verbatim to the gencamsrc GStreamer element — add as many as your camera/driver/gencamsrc support.

input:
  type: camera
  serial: <camera_serial_number> # required — camera serial number
  pixel-format: mono8 # required — e.g. mono8
  width: 1280 # required — frame width in pixels
  height: 720 # required — frame height in pixels

Frame Output#

Streams to http://localhost:8889/front. Open in a browser.

output:
  frame:
    - type: webrtc
      peer_id: front

Streams to rtsp://localhost:8554/front. Open in VLC.

output:
  frame:
    - type: rtsp
      path: /front

Streams to both http://localhost:8889/front and rtsp://localhost:8554/front simultaneously.

output:
  frame:
    - type: webrtc
      peer_id: front
    - type: rtsp
      path: /front

Metadata Output#

Download the Mosquitto Windows installer from the official Mosquitto website and install it. The default install path is C:\Program Files\mosquitto\. Publishes inference results to an MQTT broker. Requires Mosquitto running on port 1883.

output:
  metadata:
    - type: mqtt
      topic: inference/front
      port: 1883

Start the broker before running the app:

# Terminal 1 — start broker
cd "C:\Program Files\mosquitto"
.\mosquitto.exe -v

# Terminal 2 — subscribe to verify
# The topic passed to -t must match the topic value set in config.yaml (e.g. inference/front)
& "C:\Program Files\mosquitto\mosquitto_sub.exe" -h localhost -t inference/front -v

Writes inference results as JSON Lines to a local file inside output directory.

output:
  metadata:
    - type: file
      path: "output/front-inference.jsonl"

Full Pipeline Example#

logging:
  level: INFO
  file: null

metrics:
  enabled: false

models:
  inst0:
    type: detection
    model: "C:/Users/path/to/model.xml"
    device: CPU
    properties:
      batch_size: 1
      threshold: 0.4

pipelines:
  front:
    input:
      type: file
      url: "C:/Users/path/to/video.avi"
    inference:
      model_id: inst0
    output:
      frame:
        - type: rtsp
          path: /front
      metadata:
        - type: mqtt
          topic: inference/front
          port: 1883
  back:
    input:
      type: file
      url: "C:/Users/path/to/video.avi"
    inference:
      model_id: inst0
    output:
      frame:
        - type: webrtc
          peer_id: back
      metadata:
        - type: file
          path: "output/back-inference.jsonl"

For detection models use model_id as inst0, and for classifcation models use model_id as inst1.


Supported Pipeline Combinations#

The following combinations are supported in basic configuration mode.

Important: input and inference are mandatory for all pipeline combinations below.

Frame Output

Metadata Output

RTSP

MQTT

WebRTC

MQTT

RTSP + WebRTC

MQTT

RTSP

File

WebRTC

File

RTSP + WebRTC

File

RTSP

MQTT + File

WebRTC

MQTT + File

RTSP + WebRTC

MQTT + File

RTSP

None

WebRTC

None

RTSP + WebRTC

None

None

MQTT

None

File

None

MQTT + File

None

None

Notes:

  • A single pipeline can output to both RTSP and WebRTC simultaneously using a GStreamer tee.

  • Multiple metadata outputs (MQTT + File) can be combined on the same pipeline.

  • When no frame output is configured, the pipeline renders locally using d3d11videosink.

For custom element chains or combinations not listed above, use Raw Pipeline Mode.


Run the App#

python app.py config.yaml

On startup the app loads the config, starts MediaMTX, launches all pipelines, and prints viewer URLs:

[front] RTSP stream:   rtsp://localhost:8554/front
[back]  WebRTC stream: http://localhost:8889/back

Press Ctrl+C if you need to forcefully stop the application.


Advanced: Raw Pipeline Mode#

Pass complete GStreamer strings directly — models and pipelines sections are ignored:

raw_pipelines:
  front: "filesrc location=\"C:/Users/path/to/video\" ! decodebin3 name=src ! gvadetect model=\"C:/Users/path/to/detection/model.xml\" device=GPU pre-process-backend=d3d11 name=detection model-instance-id=inst0 threshold=0.4 batch-size=1 ! queue ! gvawatermark ! d3d11convert ! gvafpscounter  ! d3d11videosink name=sink"
  back: "filesrc location=\"C:/Users/path/to/video.avi\" ! decodebin3 name=src ! gvadetect model=\"C:/Users/path/to/detection/model.xml\" device=GPU pre-process-backend=d3d11 name=detection model-instance-id=inst0 threshold=0.4 batch-size=1 ! queue ! gvawatermark ! d3d11convert ! gvafpscounter  ! identity name=sink ! mfh264enc bitrate=2000 gop-size=15 ! h264parse ! rtspclientsink location=rtsp://localhost:8554/back"
  right: "filesrc location=\"C:/Users/path/to/video.avi\" ! decodebin3 name=src ! gvadetect model=\"C:/Users/path/to/detection/model.xml\" device=GPU pre-process-backend=d3d11 name=detection model-instance-id=inst0 threshold=0.4 batch-size=1 ! queue ! gvawatermark ! d3d11convert ! gvafpscounter ! identity name=sink ! mfh264enc bitrate=2000 gop-size=15 ! h264parse ! whipclientsink signaller::whip-endpoint=http://localhost:8889/front/whip"
  left: "filesrc location=\"C:/Users/path/to/video.avi\" ! decodebin3 name=src ! gvadetect model=\"C:/Users/path/to/detection/model.xml\" device=GPU pre-process-backend=d3d11 name=detection model-instance-id=inst0 threshold=0.4 batch-size=1 ! queue ! gvametaconvert add-empty-results=true ! gvametapublish method=mqtt topic=inference/back address=tcp://localhost:1883 ! queue ! gvawatermark ! d3d11convert ! gvafpscounter ! d3d11videosink name=sink"
  camera: "gencamsrc serial=12345678 pixel-format=mono8 name=src ! videoscale ! video/x-raw, width=1920,height=1080 ! videoconvert ! queue ! d3d12videosink name=sink"

The above pipelines are example pipelines to run with webrtc/rtsp/any sink element.

MediaMTX starts automatically when rtspclientsink or whipclientsink appears in a string.


Troubleshooting#

Inference on NPU fails with Failed to construct OpenVINOImageInference error#

To solve this error, ensure you install the latest supported Intel® NPU Driver for Windows for Intel® Core™ Ultra processors from the official Intel website.