# How It Works This document provides a comprehensive technical overview of the system architecture, component interactions, data flows, and design decisions. ## System Architecture ### High-Level Architecture ```text ┌─────────────────────────────────────────────────────────────────────────────┐ │ DINE-IN ORDER ACCURACY │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌──────────────────┐ ┌─────────────────────┐ │ │ │ │ │ │ │ │ │ │ │ Gradio UI │─────▶│ FastAPI API │─────▶│ Validation │ │ │ │ (Port 7861)│ │ (Port 8083) │ │ Service │ │ │ │ │ │ │ │ │ │ │ └─────────────┘ └────────┬─────────┘ └──────────┬──────────┘ │ │ │ │ │ │ │ │ │ │ ┌───────────┴───────────┐ │ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌────────────────┐ ┌─────────────────┐ ┌───────────────┐ │ │ │ │ │ │ │ │ │ │ │ VLM Client │ │ Semantic Client │ │ Metrics │ │ │ │ (Circuit │ │ (Circuit │ │ Collector │ │ │ │ Breaker) │ │ Breaker) │ │ │ │ │ │ │ │ │ │ │ │ │ └───────┬────────┘ └────────┬────────┘ └───────────────┘ │ │ │ │ │ └───────────────────┼───────────────────────┼───────────────────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ OVMS VLM │ │ Semantic │ │ (Qwen2.5-VL) │ │ Service │ │ Port 8000 │ │ Port 8080 │ │ │ │ │ └─────────────────┘ └─────────────────┘ ``` ### Request Flow ```text ┌──────────┐ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────────┐ │ Staff │ │ Gradio │ │ FastAPI │ │ VLM │ │ Semantic │ │ Trigger │ │ UI │ │ API │ │ Client │ │ Client │ └────┬─────┘ └────┬────┘ └────┬─────┘ └────┬────┘ └────┬─────┘ │ │ │ │ │ │ Select Image │ │ │ │ │──────────────▶│ │ │ │ │ │ │ │ │ │ Click Validate│ │ │ │ │──────────────▶│ │ │ │ │ │ │ │ │ │ │ POST /validate │ │ │ │─────────────▶│ │ │ │ │ │ │ │ │ │ │ Preprocess │ │ │ │ │ Image │ │ │ │ │───────────────│ │ │ │ │ │ │ │ │ │ analyze_plate() │ │ │ │──────────────▶│ │ │ │ │ │ │ │ │ │ │ OVMS POST │ │ │ │ │─────────────▶│ │ │ │ │ │ │ │ │ │◀─────────────│ │ │ │ │ Detected Items │ │ │◀──────────────│ │ │ │ │ │ │ │ │ │ match_items() │ │ │ │─────────────────────────────▶│ │ │ │ │ │ │ │◀─────────────────────────────│ │ │ │ Similarity Scores │ │ │ │ │ │ │ │◀─────────────│ │ │ │ │ Validation Result │ │ │◀──────────────│ │ │ │ │ Display Results │ │ │ │ │ │ │ │ ``` ### Docker Services | Container | Image | Ports | Description | | ------------------------- | --------------------------------------- | ---------- | ----------------------------------- | | `dinein_app` | `intel/order-accuracy-dine-in:2026.0.0` | 7861, 8083 | Main application (Gradio + FastAPI) | | `dinein_ovms_vlm` | `openvino/model_server:latest-gpu` | 8002 | Vision-Language Model server | | `dinein_semantic_service` | `intel/semantic-search-agent:1.0.0` | 8081, 9091 | Semantic text matching | | `metrics-collector` | `intel/hl-ai-metrics-collector:1.0.0` | 8084 | System metrics aggregation | ### Network Topology ```text ┌─────────────────────────────────────────────────────────────────┐ │ Docker Network: dinein-net │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ dinein_app │ │ dinein_ovms_vlm │ │ │ │ │ │ │ │ │ │ - Gradio:7861 │───▶│ - REST: 8000 │ (internal) │ │ │ - API:8083 │ │ - Host: 8002 │ (external) │ │ │ │ │ │ │ │ └────────┬────────┘ └─────────────────┘ │ │ │ │ │ │ ┌─────────────────┐ │ │ │ │ semantic_service│ │ │ └────────────▶│ │ │ │ │ - REST: 8080 │ (internal) │ │ │ - Host: 8081 │ (external) │ │ └─────────────────┘ │ │ │ │ ┌─────────────────┐ │ │ │metrics-collector│ │ │ │ - REST: 8084 │◀────── Prometheus-style metrics │ │ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ │ ▼ Host Network ┌───────────────────────┐ │ localhost:7861 │ ← Gradio UI │ localhost:8083 │ ← REST API │ localhost:8083/docs │ ← Swagger Docs │ localhost:8002 │ ← OVMS VLM │ localhost:8084 │ ← Metrics API └───────────────────────┘ ``` --- ## Component Details ### 1. VLM Client (`vlm_client.py`) The VLM Client handles communication with OpenVINO Model Server for visual inference. **Features:** - **Image Preprocessing**: Smart resizing (672px max), JPEG compression (82% quality), contrast enhancement - **Circuit Breaker**: 5 failures → OPEN, 30s recovery → HALF_OPEN, 2 successes → CLOSED - **Connection Pooling**: Shared `httpx.AsyncClient` with HTTP/2, 50 max connections - **Inventory-Aware Prompts**: Includes known menu items for improved accuracy ```python # Circuit Breaker States class CircuitState(Enum): CLOSED = "closed" # Normal operation OPEN = "open" # Failing, reject requests HALF_OPEN = "half_open" # Testing recovery ``` ### 2. Semantic Client (`semantic_client.py`) Handles fuzzy string matching for item comparison. **Features:** - **Similarity Threshold**: Default 0.7 (70% match required) - **Fallback Matching**: Exact string match when service unavailable - **Circuit Breaker**: 15s recovery timeout (faster than VLM) - **Connection Pool**: Shared client with 20 max connections ### 3. Validation Service (`validation_service.py`) Orchestrates the validation workflow using Strategy pattern. **Validation Pipeline:** 1. VLM inference → detected items 2. Semantic matching → item correlations 3. Quantity analysis → mismatches 4. Accuracy calculation → final score ```python # Accuracy Calculation accuracy = matched_items / max(expected_items, detected_items) order_complete = (missing == 0) and (quantity_errors == 0) and (extra == 0) ``` ### 4. Configuration Manager (`config.py`) Thread-safe singleton for application configuration. **Features:** - Double-checked locking pattern - Environment variable driven - Runtime benchmark mode toggle ### 5. API Layer (`api.py`) FastAPI endpoints with bounded validation cache. **Features:** - **BoundedValidationCache**: LRU eviction, 10K max entries - **Thread-safe service init**: Lock-protected lazy initialization - **Async metrics collection**: Non-blocking system stats --- ## Data Flow ### Validation Request Processing 1. **Image Processing**: Raw Image → Auto-Orient → Resize (672px) → Enhance → Sharpen → JPEG Compress (82%) → Base64 Encode 2. **VLM Inference**: Prompt: "Analyze this food plate image..." + Inventory list for context → OVMS POST `/v3/chat/completions` → Parse JSON response for detected items 3. **Semantic Matching**: For each expected item: - Find best match in detected items (similarity > 0.7) - Track: matched, missing, extra, quantity mismatches 4. **Result Aggregation**: ```text { "order_complete": true/false, "accuracy_score": 0.0-1.0, "missing_items": [...], "extra_items": [...], "metrics": { "latency": [...], "tps": [...], "utilization": [...] } } ``` ### Metrics Collection ```text ┌─────────────────────────────────────────────────────────────────────┐ │ METRICS PIPELINE │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ VLM CLIENT METRICS COLLECTOR │ │ ┌────────────────┐ ┌────────────────┐ │ │ │ log_start_time │──────────▶│ Start Timestamp│ │ │ │ log_end_time │──────────▶│ End Timestamp │ │ │ │ log_custom_event │ TPS, Tokens │ │ │ │ - tps │──────────▶│ Preprocess Time│ │ │ │ - tokens │ │ Items Detected │ │ │ │ - latency │ └────────┬───────┘ │ │ └────────────────┘ │ │ │ ▼ │ │ ┌────────────────┐ │ │ │ JSON/CSV Export│ │ │ │ results/*.json │ │ │ │ results/*.csv │ │ │ └────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## Production Features ### Circuit Breaker Pattern Prevents cascading failures when external services are unhealthy. ```mermaid flowchart LR CLOSED["CLOSED"] OPEN["OPEN"] HALFOPEN["HALF-OPEN"] CLOSED -- "5 consecutive failures" --> OPEN OPEN -- "30s timeout" --> HALFOPEN HALFOPEN -- "2 successes" --> CLOSED HALFOPEN -- "1 failure" --> OPEN ``` ### Connection Pooling ```python # VLM Client Pool Configuration limits = httpx.Limits( max_keepalive_connections=20, max_connections=50, keepalive_expiry=30.0 ) timeout = httpx.Timeout( connect=10.0, read=300.0, # Extended for VLM inference write=10.0, pool=10.0 ) client = httpx.AsyncClient(limits=limits, timeout=timeout, http2=True) ``` ### Bounded Cache (LRU) ```python class BoundedValidationCache: """Thread-safe LRU cache with automatic eviction""" def __init__(self, maxsize: int = 10000): self._cache = OrderedDict() self._maxsize = maxsize self._lock = threading.Lock() def __setitem__(self, key, value): with self._lock: if key in self._cache: self._cache.move_to_end(key) self._cache[key] = value # Evict oldest when full while len(self._cache) > self._maxsize: self._cache.popitem(last=False) ``` --- ## Performance Characteristics ### Latency Breakdown | Stage | Typical Duration | | ------------------- | ---------------- | | Image Preprocessing | 50–100 ms | | VLM Inference | 8–12 s | | Semantic Matching | 20–50 ms | | **Total E2E** | **9–15 s** | Target: < 15 s end-to-end for operational efficiency. --- ## System Requirements See the [System Requirements](./get-started/system-requirements.md) for detailed hardware, software, and network prerequisites. --- ## Pre-Deployment Checklist - [ ] Docker and Docker Compose installed and working - [ ] Intel GPU drivers installed and GPU visible to Docker - [ ] Required ports available (7861, 8083, 8002, 8081, 8084) - [ ] At least 50 GB free disk space - [ ] VLM model downloaded (`setup_models.sh` completed) - [ ] `.env` file created (`make init-env`) - [ ] Plate images placed in `images/` and `configs/orders.json` updated