How It Works#
This document provides an overview of the architecture and components of Win Vision AI.
Architecture#
Inputs#
Video file — local video file playback
RTSP camera — network camera stream
GenICam camera — industrial camera via GenICam SDK
Application#
Config Loader — loads and validates YAML configuration; defines models and pipelines
Pipeline Manager — manages N parallel GStreamer pipelines with FPS and latency probes
Media Manager — manages the embedded MediaMTX server for RTSP and WebRTC output
Metrics Collector — exports pipeline metrics to log or Prometheus
Inference#
Intel DL Streamer — runs object detection and classification inference using OpenVINO™ on:
CPU: runs inference using the OpenVINO™ runtime on system memory
GPU: runs inference using the D3D11 OpenVINO™ plugin with D3D11 shared memory
NPU: runs inference using the OpenVINO™ runtime on the neural engine
Outputs#
MediaMTX — re-streams encoded video over RTSP (port 8554) and WebRTC (port 8889)
MQTT broker — receives structured inference metadata over TCP (port 1883)
JSON file — writes inference metadata to a file using the DL Streamer
gvametapublishelement (for information on the element, see DL Streamer Documentation)
Viewers#
Browser / VLC — consume the live stream over WebRTC or RTSP
MQTT subscriber — consumes inference metadata published to the MQTT broker