# How It Works
This document provides a comprehensive technical overview of the system architecture, component interactions, data flows, and design decisions.
## System Architecture
### High-Level Architecture
```mermaid
flowchart TB
subgraph SYS["Take-Away Order Accuracy"]
direction LR
rtsp["RTSP Video Streams
(GStreamer)"] --> oas["Order Accuracy Service
(EasyOCR)"]
oas --> minio["MinIO
(Frame Storage)"]
minio --> selector["Frame Selector
(YOLO11n-CPU)"]
selector -->|top 3 frames| scheduler["VLM Scheduler
(ThreadPool)"]
selector -->|top 3 frames| validation["Validation Agent"]
scheduler --> ovms["OVMS VLM
(Qwen2.5-VL, GPU-INT8)"]
validation --> semantic["Semantic Service"]
end
ovms --> ui["Gradio UI
(Interface)"]
semantic --> ui
```
### Component Summary
| Component | Technology | Purpose |
| ---------------------- | ---------------------------------------- | ------------------------------- |
| Order Accuracy Service | Python, FastAPI | Core orchestration and API |
| Station Workers | GStreamer (DL Streamer), multiprocessing | RTSP video processing |
| VLM Scheduler | Threading, queue batching | Request optimization |
| Frame Selector | YOLO, OpenCV | Optimal frame detection |
| OVMS VLM | OpenVINO Model Server | Vision Language Model inference |
| Semantic Service | FastAPI | Text semantic matching |
| Gradio UI | Gradio | Web interface |
| MinIO | S3-compatible storage | Frame and result storage |
---
## Service Modes
The system supports two operational modes, each optimized for different deployment scenarios.
### Single Worker Mode
```text
┌──────────────────────────────────────────────────────────────┐
│ SINGLE WORKER MODE │
│ │
│ ┌────────────┐ ┌────────────────┐ ┌────────────┐ │
│ │ Gradio UI │────▶│ FastAPI REST │────▶│ VLM │ │
│ │ │ │ /upload-video │ │ Service │ │
│ └────────────┘ └────────────────┘ └────────────┘ │
│ │
│ Characteristics: │
│ • Sequential video processing │
│ • Direct VLM calls (no batching) │
│ • Best for: Development, testing, demos │
└──────────────────────────────────────────────────────────────┘
```
**Configuration:**
```bash
SERVICE_MODE=single
WORKERS=0
```
**Features:**
- Video upload via REST API
- Single order at a time
- Gradio UI integration
- FastAPI Swagger documentation
### Parallel Worker Mode
```text
┌─────────────────────────────────────────────────────────────────────────────┐
│ PARALLEL WORKER MODE │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Station │ │ Station │ │ Station │ (independent, │
│ │ Worker 1 │ │ Worker 2 │ │ Worker N │ each with GStreamer │
│ │ (GStr+OCR) │ │ (GStr+OCR) │ │ (GStr+OCR) │ + EasyOCR) │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ MinIO (Frame Storage) │ │
│ │ station_1/ station_2/ station_N/ │ │
│ └─────────────────────┬────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Frame Selector (YOLO11n - CPU) │ │
│ │ Select top 3 frames per order │ │
│ └─────────────────────┬────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ VLM Scheduler (ThreadPoolExecutor) │ │
│ │ Parallel requests to OVMS │ │
│ └─────────────────────┬────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ OVMS VLM (GPU) │ │
│ │ Qwen2.5-VL-7B / Continuous Batching │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ Characteristics: │
│ • Independent RTSP stream per station │
│ • Shared EasyOCR, YOLO, and VLM models │
│ • Parallel VLM requests via ThreadPoolExecutor │
│ • OVMS continuous batching on GPU │
│ • Best for: Production, multi-camera deployments │
└─────────────────────────────────────────────────────────────────────────────┘
```
**Configuration:**
```bash
SERVICE_MODE=parallel
WORKERS=N
SCALING_MODE=fixed # or 'auto'
```
**Features:**
- Independent GStreamer pipeline per station
- Shared EasyOCR, YOLO, and VLM models across stations
- Parallel VLM requests via ThreadPoolExecutor
- OVMS continuous batching on GPU
- Circuit breaker pattern with exponential backoff
---
## Core Components
### 1. Main Entry Point (`src/main.py`)
The unified service entry point manages mode selection and service initialization.
```python
# Mode Selection Logic
SERVICE_MODE = os.getenv("SERVICE_MODE", "single")
if SERVICE_MODE == "single":
# Start FastAPI with REST endpoints
run_single_mode()
elif SERVICE_MODE == "parallel":
# Start multi-worker orchestration
run_parallel_mode()
```
**Responsibilities:**
- Environment configuration loading
- Mode-based service initialization
- Signal handling for graceful shutdown
- Worker process spawning (parallel mode)
- Hostname-based station detection
### 2. Station Worker (`src/parallel/station_worker.py`)
Production-ready worker process for single camera stream processing.
```text
┌─────────────────────────────────────────────────────────────┐
│ STATION WORKER LIFECYCLE │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Initialize│──▶│Wait RTSP │──▶│ Start │──▶│ Monitor │ │
│ │ │ │ │ │ Pipeline │ │ Health │ │
│ └──────────┘ └──────────┘ └──────────┘ └────┬─────┘ │
│ │ │
│ ┌──────────────────┬──────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Circuit │ │ Backoff │ │ Verify │ │
│ │ Breaker │─────▶│ & Retry │─────▶│ RTSP │──▶Restart │
│ │ Check │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
```
**Key Features:**
| Feature | Implementation |
| ------------------- | -------------------------------------------------- |
| GStreamer Pipeline | RTSP → H.264 decode → `gvapython` frame processing |
| Circuit Breaker | 5 failures in 120s (2 min) → 10s cooldown |
| Exponential Backoff | 1s → 2s → 4s → ... → 15s max |
| Stall Detection | No EOS markers for 120s triggers restart |
| Health Monitoring | Frame rate, pipeline state tracking |
**Configuration (`src/parallel/station_worker.py`):**
```python
@dataclass
class PipelineConfig:
rtsp_latency_ms: int = 0 # Zero buffering
rtsp_retry_count: int = 50
rtsp_timeout_us: int = 2000000 # 2 seconds
restart_base_delay_sec: float = 1.0
restart_max_delay_sec: float = 15.0
circuit_breaker_max_failures: int = 5
circuit_breaker_window_sec: float = 120.0 # 2 minutes
circuit_breaker_cooldown_sec: float = 10.0
stall_detection_timeout_sec: float = 120.0 # 2 minutes
```
### 3. VLM Scheduler (`src/parallel/vlm_scheduler.py`)
Request batching scheduler optimizing OVMS throughput.
```text
┌─────────────────────────────────────────────────────────────────────────┐
│ VLM SCHEDULER ARCHITECTURE │
│ │
│ Worker 1 ──┐ │
│ │ ┌─────────────┐ ┌───────────────┐ │
│ Worker 2 ──┼────▶│ Collector │────▶│ │ │
│ │ │ Thread │ │ Batch │ ┌─────────┐ │
│ Worker N ──┘ │ │ │ Buffer │───▶│ OVMS │ │
│ └─────────────┘ │ (50-100ms) │ │ VLM │ │
│ │ │ └────┬────┘ │
│ ┌─────────────┐ └───────────────┘ │ │
│ Response ◀───────│ Response │◀──────────────────────────────┘ │
│ Routing │ Router │ │
│ └─────────────┘ │
│ │
│ Time-window batching: Collect requests for 50-100ms, send as batch │
│ Fair scheduling: Round-robin across workers │
│ Backpressure: Queue limits prevent memory exhaustion │
└─────────────────────────────────────────────────────────────────────────┘
```
**Batching Strategy:**
- **Time Window**: 50-100ms collection period
- **Max Batch Size**: Configurable (default: 16)
- **Fair Scheduling**: Round-robin request servicing
- **Response Routing**: Match responses to original requesters
### 4. VLM Component (`src/core/vlm_service.py`)
Vision Language Model processing with inventory detection and order validation.
**Responsibilities:**
- Process selected frames through VLM
- Generate item detection prompts
- Parse VLM responses into structured data
- Coordinate with validation agent
### 5. OVMS VLM Client (`src/core/ovms_client.py`)
OpenVINO Model Server client with OpenAI-compatible API.
```text
┌─────────────────────────────────────────────────────────────┐
│ OVMS CLIENT │
│ │
│ ┌────────────┐ ┌────────────────┐ ┌────────────┐ │
│ │ Image │────▶│ Base64 │────▶│ POST │ │
│ │ (numpy) │ │ Encoding │ │/v3/chat/ │ │
│ └────────────┘ └────────────────┘ │completions │ │
│ └─────┬──────┘ │
│ │ │
│ ┌────────────┐ ┌────────────────┐ │ │
│ │ Metrics │◀────│ Response │◀──────────┘ │
│ │ Logging │ │ Parsing │ │
│ └────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
**API Integration:**
```python
# OpenAI-compatible chat/completions endpoint
response = requests.post(
f"{OVMS_ENDPOINT}/v3/chat/completions",
json={
"model": OVMS_MODEL_NAME,
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}}
]
}
],
"max_completion_tokens": 100
}
)
```
---
## Data Flow Pipeline
### Complete Request Flow
1. **Video Capture**:
RTSP Camera → GStreamer Pipeline → Frame Buffer
2. **Frame Selection**:
- Frame Selector (YOLO):
- Object detection on raw frames
- Score frames by item visibility
- Select top K frames per order
- Store selected frames in MinIO
3. **VLM Processing**:
- VLM Scheduler → OVMS (Qwen2.5-VL):
- Batch frames by time window
- Send to OVMS with detection prompt
- Parse structured item response
4. **Order Validation**:
- Validation Agent:
- Compare detected items with expected order
- Exact match → Semantic match → Flag mismatch
- Generate validation result
5. **Result Output**:
- { "matched": [...], "missing": [...], "extra": [...] }
### State Transitions
```mermaid
flowchart LR
subgraph OUTER[ ]
direction TB
subgraph INNER[WORKER STATE MACHINE]
direction LR
STOPPED[STOPPED] --> STARTING[STARTING] --> RUNNING[RUNNING] --> STALLED[STALLED]
RUNNING --> RESTARTING[RESTARTING]
RUNNING --> CIRCUIT_OPEN[CIRCUIT_OPEN]
STARTING --> CIRCUIT_OPEN
STALLED --> RESTARTING
STOPPED --> SHUTTING_DOWN[SHUTTING_DOWN]
CIRCUIT_OPEN --> SHUTTING_DOWN
RESTARTING --> SHUTTING_DOWN
end
end
style OUTER fill:#f7f9fc,stroke:#4a5568,stroke-width:2px,color:#111827
style INNER fill:#ffffff,stroke:#6b7280,stroke-width:1px,color:#111827
```
---
## Video Processing Architecture
### GStreamer Pipeline
The pipeline is built in `src/parallel/station_worker.py` using DL Streamer's `gvapython` plugin for per-frame processing:
```
rtspsrc location= latency=0 buffer-mode=0 protocols=tcp ntp-sync=false do-rtcp=false retry=5
! rtph264depay
! avdec_h264
! videoconvert
! video/x-raw,format=BGR
! videorate ! video/x-raw,framerate=/1
! queue max-size-buffers=200 leaky=no
! gvapython module=frame_pipeline function=process_frame
! fakesink sync=false
```
`CAPTURE_FPS` defaults to `10`. The `gvapython` element calls `frame_pipeline.process_frame()` per frame, which handles EasyOCR order slip detection and MinIO frame upload.
---
## VLM Integration
### Model Architecture
```text
┌─────────────────────────────────────────────────────────────────────────────────┐
│ OVMS VLM INTEGRATION │
│ │
│ Model: Qwen/Qwen2.5-VL-7B-Instruct │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ OVMS Model Server │ │
│ │ │ │
│ │ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ │
│ │ │ Vision │ │ Language │ │ Output │ │ │
│ │ │ Encoder │───▶│ Model │───▶│ Decoder │ │ │
│ │ │ (ViT-based) │ │ (Qwen2.5) │ │ (JSON) │ │ │
│ │ └────────────────┘ └────────────────┘ └────────────────┘ │ │
│ │ │ │
│ │ API: OpenAI-compatible /v3/chat/completions │ │
│ │ Port: 8001 (configurable) │ │
│ │ Precision: INT8 (optimized for inference) │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
```
### Request/Response Format
**Request:**
```json
{
"model": "Qwen/Qwen2.5-VL-7B-Instruct",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Identify all food items in this image..." },
{
"type": "image_url",
"image_url": { "url": "data:image/jpeg;base64,..." }
}
]
}
],
"max_completion_tokens": 100,
"temperature": 0.2
}
```
**Response:**
```json
{
"choices": [
{
"message": {
"content": "{\"detected_items\": [{\"name\": \"burger\", \"quantity\": 2}]}"
}
}
],
"usage": {
"prompt_tokens": 1250,
"completion_tokens": 45,
"total_tokens": 1295
}
}
```
---
## Frame Selection Service
### YOLO-Based Frame Selection
**Frame selector service (`frame_selector.py`):**
1. Monitor frames bucket in MinIO
2. Run YOLO object detection on each frame
3. Score frames by:
- Object detection confidence
- Item visibility/occlusion
- Frame quality (blur, brightness)
4. Select TOP_K frames per order
5. Store selected frames in 'selected' bucket
6. Trigger VLM processing via HTTP callback
- Configuration:
- TOP_K: 3 (frames per order)
- POLL_INTERVAL: 1.5s
- MIN_FRAMES_PER_ORDER: 1
- YOLO_MODEL: yolo11n (INT8 OpenVINO)
The selection algorithm scores each frame using YOLO detection confidence and item count, then selects the top `TOP_K` frames per order. Implementation is in `frame-selector-service/`.
---
## Semantic Matching
### Matching Architecture
**Semantic matching system:**
- Validation agent
- Pass 1: Exact match
- "burger" == "burger" → MATCH
- Pass 2: Semantic match (if exact fails)
- "quarter pounder" ≈ "quarterpounder" → MATCH (semantic similarity)
- Pass 3: Flag mismatch
- No match found → Add to "missing" or "extra"
- Semantic service integration
- External microservice: `http://semantic-service:8080`
- Fallback: Local semantic matching if service unavailable
### Matching Strategies
| Strategy | Description | Use Case |
| ---------- | -------------------------- | ------------------- |
| `exact` | String equality comparison | Fast, deterministic |
| `semantic` | Embedding similarity | Handles variations |
| `hybrid` | Exact first, then semantic | Recommended default |
---
## Docker Services Topology
### Service Deployment
```text
┌─────────────────────────────────────────────────────────────────────────────────┐
│ DOCKER SERVICES TOPOLOGY │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ order-accuracy-net │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ minio │ │ ovms-vlm │ │order-accuracy│ │ │
│ │ │ :9000/9001 │ │ :8001 │ │ :8000 │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │frame-selector│ │ gradio-ui │ │semantic-svc │ │ │
│ │ │ (internal) │ │ :7860 │ │ :8080 │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │rtsp-streamer │ (parallel profile) │ │
│ │ │ :8554 │ │ │
│ │ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
│ Volumes: │
│ • minio-data: S3-compatible object storage │
│ • videos: Input video files │
│ • results: Output results and metrics │
│ • models: OVMS model files │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
```
### Service Dependencies
```yaml
services:
order-accuracy:
depends_on:
- minio
- ovms-vlm
frame-selector:
depends_on:
- minio
- order-accuracy
gradio-ui:
depends_on:
- order-accuracy
```
---
## Production Patterns
### Circuit Breaker Pattern
```mermaid
flowchart LR
CLOSED["CLOSED (Normal)"]
OPEN["OPEN (Blocking)"]
HALFOPEN["HALF-OPEN (Testing)"]
CLOSED -- "5 failures" --> OPEN
OPEN -- "30s elapsed" --> HALFOPEN
HALFOPEN -- "success" --> CLOSED
```
**Configuration:**
- Failure threshold: 5 failures
- Time window: 120 seconds (2 minutes)
- Cooldown period: 10 seconds
### Exponential Backoff
Backoff sequence: 1s → 2s → 4s → 8s → 15s (max), with jitter. Implemented in `src/parallel/station_worker.py`.
### Health Monitoring
```python
@dataclass
class PipelineMetrics:
pipeline_restarts: int = 0
pipeline_failures: int = 0
rtsp_unavailable_events: int = 0
successful_frames_processed: int = 0
circuit_breaker_trips: int = 0
stall_detections: int = 0
last_frame_time: float = 0.0
pipeline_start_time: float = 0.0
total_uptime_sec: float = 0.0
```
---
## Scalability Considerations
### Horizontal Scaling
Scaling architecture:
- Fixed Scaling (SCALING_MODE=fixed):
- Static worker count defined at startup
- Configuration: WORKERS=N (set at startup)
- Auto Scaling (SCALING_MODE=auto):
- Dynamic worker adjustment based on metrics
- Scale up triggers:
- GPU utilization > 80%
- Average VLM latency > target
- Request queue depth > threshold
- Scale down triggers:
- GPU utilization < 30%
- Idle workers for > 5 minutes
- Bottleneck Analysis:
- VLM inference: Primary bottleneck (~2-3s per request)
- Mitigation: Request batching via VLM Scheduler
- Target throughput: 20-30 orders/minute per GPU
### Performance Optimization
| Optimization | Implementation |
| ------------------ | ------------------------------------------------ |
| VLM Batching | 50-100ms time windows via VLM Scheduler |
| Frame Selection | YOLO pre-filtering reduces unnecessary VLM calls |
| INT8 Quantization | OpenVINO INT8 model served via OVMS |
| Connection Pooling | HTTP session reuse to OVMS |
---
## Summary
Take-Away Order Accuracy is a production-ready system designed for high-throughput, reliable order validation in QSR environments. Key architectural highlights:
1. **Dual-Mode Operation**: Single worker for development, parallel workers for production
2. **Resilient Video Processing**: GStreamer with circuit breaker and auto-recovery
3. **Optimized VLM Inference**: Request batching and INT8 quantization
4. **Intelligent Frame Selection**: YOLO-based filtering reduces unnecessary VLM calls
5. **Hybrid Matching**: Exact + semantic matching for robust item comparison
6. **Production Patterns**: Circuit breaker, exponential backoff, health monitoring
For deployment and configuration details, see the companion guides in this documentation suite.
---
## System Requirements
See the [System Requirements](./get-started/system-requirements.md) for detailed hardware, software, and network prerequisites.
---
## Pre-Deployment Checklist
- [ ] Docker and Docker Compose installed and working
- [ ] Intel GPU drivers installed and GPU visible to Docker
- [ ] Required ports available (8000, 7860, 8001, 9000, 9001, 8080)
- [ ] At least 50 GB free disk space
- [ ] VLM model downloaded (`setup_models.sh` completed)
- [ ] `.env` file configured
- [ ] Camera RTSP URLs accessible from host (parallel mode)