How It Works#
This document provides a comprehensive technical overview of the system architecture, component interactions, data flows, and design decisions.
System Architecture#
High-Level Architecture#
┌─────────────────────────────────────────────────────────────────────────────┐
│ DINE-IN ORDER ACCURACY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────────┐ ┌─────────────────────┐ │
│ │ │ │ │ │ │ │
│ │ Gradio UI │─────▶│ FastAPI API │─────▶│ Validation │ │
│ │ (Port 7861)│ │ (Port 8083) │ │ Service │ │
│ │ │ │ │ │ │ │
│ └─────────────┘ └────────┬─────────┘ └──────────┬──────────┘ │
│ │ │ │
│ │ │ │
│ ┌───────────┴───────────┐ │ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────┐ ┌─────────────────┐ ┌───────────────┐ │
│ │ │ │ │ │ │ │
│ │ VLM Client │ │ Semantic Client │ │ Metrics │ │
│ │ (Circuit │ │ (Circuit │ │ Collector │ │
│ │ Breaker) │ │ Breaker) │ │ │ │
│ │ │ │ │ │ │ │
│ └───────┬────────┘ └────────┬────────┘ └───────────────┘ │
│ │ │ │
└───────────────────┼───────────────────────┼───────────────────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ │ │ │
│ OVMS VLM │ │ Semantic │
│ (Qwen2.5-VL) │ │ Service │
│ Port 8000 │ │ Port 8080 │
│ │ │ │
└─────────────────┘ └─────────────────┘
Request Flow#
┌──────────┐ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────────┐
│ Staff │ │ Gradio │ │ FastAPI │ │ VLM │ │ Semantic │
│ Trigger │ │ UI │ │ API │ │ Client │ │ Client │
└────┬─────┘ └────┬────┘ └────┬─────┘ └────┬────┘ └────┬─────┘
│ │ │ │ │
│ Select Image │ │ │ │
│──────────────▶│ │ │ │
│ │ │ │ │
│ Click Validate│ │ │ │
│──────────────▶│ │ │ │
│ │ │ │ │
│ │ POST /validate │ │
│ │─────────────▶│ │ │
│ │ │ │ │
│ │ │ Preprocess │ │
│ │ │ Image │ │
│ │ │───────────────│ │
│ │ │ │ │
│ │ │ analyze_plate() │
│ │ │──────────────▶│ │
│ │ │ │ │
│ │ │ │ OVMS POST │
│ │ │ │─────────────▶│
│ │ │ │ │
│ │ │ │◀─────────────│
│ │ │ │ Detected Items
│ │ │◀──────────────│ │
│ │ │ │ │
│ │ │ match_items() │
│ │ │─────────────────────────────▶│
│ │ │ │
│ │ │◀─────────────────────────────│
│ │ │ Similarity Scores │
│ │ │ │ │
│ │◀─────────────│ │ │
│ │ Validation Result │ │
│◀──────────────│ │ │ │
│ Display Results │ │ │
│ │ │ │ │
Docker Services#
Container |
Image |
Ports |
Description |
|---|---|---|---|
|
|
7861, 8083 |
Main application (Gradio + FastAPI) |
|
|
8002 |
Vision-Language Model server |
|
|
8081, 9091 |
Semantic text matching |
|
|
8084 |
System metrics aggregation |
Network Topology#
┌─────────────────────────────────────────────────────────────────┐
│ Docker Network: dinein-net │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ dinein_app │ │ dinein_ovms_vlm │ │
│ │ │ │ │ │
│ │ - Gradio:7861 │───▶│ - REST: 8000 │ (internal) │
│ │ - API:8083 │ │ - Host: 8002 │ (external) │
│ │ │ │ │ │
│ └────────┬────────┘ └─────────────────┘ │
│ │ │
│ │ ┌─────────────────┐ │
│ │ │ semantic_service│ │
│ └────────────▶│ │ │
│ │ - REST: 8080 │ (internal) │
│ │ - Host: 8081 │ (external) │
│ └─────────────────┘ │
│ │
│ ┌─────────────────┐ │
│ │metrics-collector│ │
│ │ - REST: 8084 │◀────── Prometheus-style metrics │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
│
▼ Host Network
┌───────────────────────┐
│ localhost:7861 │ ← Gradio UI
│ localhost:8083 │ ← REST API
│ localhost:8083/docs │ ← Swagger Docs
│ localhost:8002 │ ← OVMS VLM
│ localhost:8084 │ ← Metrics API
└───────────────────────┘
Component Details#
1. VLM Client (vlm_client.py)#
The VLM Client handles communication with OpenVINO Model Server for visual inference.
Features:
Image Preprocessing: Smart resizing (672px max), JPEG compression (82% quality), contrast enhancement
Circuit Breaker: 5 failures → OPEN, 30s recovery → HALF_OPEN, 2 successes → CLOSED
Connection Pooling: Shared
httpx.AsyncClientwith HTTP/2, 50 max connectionsInventory-Aware Prompts: Includes known menu items for improved accuracy
# Circuit Breaker States
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing, reject requests
HALF_OPEN = "half_open" # Testing recovery
2. Semantic Client (semantic_client.py)#
Handles fuzzy string matching for item comparison.
Features:
Similarity Threshold: Default 0.7 (70% match required)
Fallback Matching: Exact string match when service unavailable
Circuit Breaker: 15s recovery timeout (faster than VLM)
Connection Pool: Shared client with 20 max connections
3. Validation Service (validation_service.py)#
Orchestrates the validation workflow using Strategy pattern.
Validation Pipeline:
VLM inference → detected items
Semantic matching → item correlations
Quantity analysis → mismatches
Accuracy calculation → final score
# Accuracy Calculation
accuracy = matched_items / max(expected_items, detected_items)
order_complete = (missing == 0) and (quantity_errors == 0) and (extra == 0)
4. Configuration Manager (config.py)#
Thread-safe singleton for application configuration.
Features:
Double-checked locking pattern
Environment variable driven
Runtime benchmark mode toggle
5. API Layer (api.py)#
FastAPI endpoints with bounded validation cache.
Features:
BoundedValidationCache: LRU eviction, 10K max entries
Thread-safe service init: Lock-protected lazy initialization
Async metrics collection: Non-blocking system stats
Data Flow#
Validation Request Processing#
Image Processing: Raw Image → Auto-Orient → Resize (672px) → Enhance → Sharpen → JPEG Compress (82%) → Base64 Encode
VLM Inference: Prompt: “Analyze this food plate image…” + Inventory list for context → OVMS POST
/v3/chat/completions→ Parse JSON response for detected itemsSemantic Matching: For each expected item:
Find best match in detected items (similarity > 0.7)
Track: matched, missing, extra, quantity mismatches
Result Aggregation:
{ "order_complete": true/false, "accuracy_score": 0.0-1.0, "missing_items": [...], "extra_items": [...], "metrics": { "latency": [...], "tps": [...], "utilization": [...] } }
Metrics Collection#
┌─────────────────────────────────────────────────────────────────────┐
│ METRICS PIPELINE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ VLM CLIENT METRICS COLLECTOR │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ log_start_time │──────────▶│ Start Timestamp│ │
│ │ log_end_time │──────────▶│ End Timestamp │ │
│ │ log_custom_event │ TPS, Tokens │ │
│ │ - tps │──────────▶│ Preprocess Time│ │
│ │ - tokens │ │ Items Detected │ │
│ │ - latency │ └────────┬───────┘ │
│ └────────────────┘ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ JSON/CSV Export│ │
│ │ results/*.json │ │
│ │ results/*.csv │ │
│ └────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Production Features#
Circuit Breaker Pattern#
Prevents cascading failures when external services are unhealthy.
flowchart LR
CLOSED["CLOSED"]
OPEN["OPEN"]
HALFOPEN["HALF-OPEN"]
CLOSED -- "5 consecutive failures" --> OPEN
OPEN -- "30s timeout" --> HALFOPEN
HALFOPEN -- "2 successes" --> CLOSED
HALFOPEN -- "1 failure" --> OPEN
Connection Pooling#
# VLM Client Pool Configuration
limits = httpx.Limits(
max_keepalive_connections=20,
max_connections=50,
keepalive_expiry=30.0
)
timeout = httpx.Timeout(
connect=10.0,
read=300.0, # Extended for VLM inference
write=10.0,
pool=10.0
)
client = httpx.AsyncClient(limits=limits, timeout=timeout, http2=True)
Bounded Cache (LRU)#
class BoundedValidationCache:
"""Thread-safe LRU cache with automatic eviction"""
def __init__(self, maxsize: int = 10000):
self._cache = OrderedDict()
self._maxsize = maxsize
self._lock = threading.Lock()
def __setitem__(self, key, value):
with self._lock:
if key in self._cache:
self._cache.move_to_end(key)
self._cache[key] = value
# Evict oldest when full
while len(self._cache) > self._maxsize:
self._cache.popitem(last=False)
Performance Characteristics#
Latency Breakdown#
Stage |
Typical Duration |
|---|---|
Image Preprocessing |
50–100 ms |
VLM Inference |
8–12 s |
Semantic Matching |
20–50 ms |
Total E2E |
9–15 s |
Target: < 15 s end-to-end for operational efficiency.
System Requirements#
See the System Requirements for detailed hardware, software, and network prerequisites.
Pre-Deployment Checklist#
[ ] Docker and Docker Compose installed and working
[ ] Intel GPU drivers installed and GPU visible to Docker
[ ] Required ports available (7861, 8083, 8002, 8081, 8084)
[ ] At least 50 GB free disk space
[ ] VLM model downloaded (
setup_models.shcompleted)[ ]
.envfile created (make init-env)[ ] Plate images placed in
images/andconfigs/orders.jsonupdated