How It Works#

This document provides an overview of the architecture and components of Win Vision AI.

Architecture#

Win Vision AI Architecture

Inputs#

Video file — local video file playback
RTSP camera — network camera stream
GenICam camera — industrial camera via GenICam SDK

Application#

Config Loader — loads and validates YAML configuration; defines models and pipelines
Pipeline Manager — manages N parallel GStreamer pipelines with FPS and latency probes
Media Manager — manages the embedded MediaMTX server for RTSP and WebRTC output
Metrics Collector — exports pipeline metrics to log or Prometheus

Inference#

Intel DL Streamer — runs object detection and classification inference using OpenVINO™ on:
- CPU: runs inference using the OpenVINO™ runtime on system memory
- GPU: runs inference using the D3D11 OpenVINO™ plugin with D3D11 shared memory
- NPU: runs inference using the OpenVINO™ runtime on the neural engine

Outputs#

MediaMTX — re-streams encoded video over RTSP (port 8554) and WebRTC (port 8889)
MQTT broker — receives structured inference metadata over TCP (port 1883)
JSON file — writes inference metadata to a file using the DL Streamer gvametapublish element (for information on the element, see DL Streamer Documentation)

Viewers#

Browser / VLC — consume the live stream over WebRTC or RTSP
MQTT subscriber — consumes inference metadata published to the MQTT broker

Supporting Resources#

DL Streamer Documentation
- DL Streamer Supported Models
- DL Streamer Model Conversion Scripts README