How It Works#

This document provides a comprehensive technical overview of the system architecture, component interactions, data flows, and design decisions.

System Architecture#

High-Level Architecture#

┌─────────────────────────────────────────────────────────────────────────────┐
│                           DINE-IN ORDER ACCURACY                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────┐      ┌──────────────────┐      ┌─────────────────────┐    │
│  │             │      │                  │      │                     │    │
│  │  Gradio UI  │─────▶│   FastAPI API    │─────▶│   Validation        │    │
│  │  (Port 7861)│      │   (Port 8083)    │      │   Service           │    │
│  │             │      │                  │      │                     │    │
│  └─────────────┘      └────────┬─────────┘      └──────────┬──────────┘    │
│                                │                           │               │
│                                │                           │               │
│                    ┌───────────┴───────────┐               │               │
│                    │                       │               │               │
│                    ▼                       ▼               ▼               │
│           ┌────────────────┐     ┌─────────────────┐ ┌───────────────┐    │
│           │                │     │                 │ │               │    │
│           │  VLM Client    │     │ Semantic Client │ │ Metrics       │    │
│           │  (Circuit      │     │ (Circuit        │ │ Collector     │    │
│           │   Breaker)     │     │  Breaker)       │ │               │    │
│           │                │     │                 │ │               │    │
│           └───────┬────────┘     └────────┬────────┘ └───────────────┘    │
│                   │                       │                               │
└───────────────────┼───────────────────────┼───────────────────────────────┘
                    │                       │
                    ▼                       ▼
          ┌─────────────────┐     ┌─────────────────┐
          │                 │     │                 │
          │   OVMS VLM      │     │   Semantic      │
          │   (Qwen2.5-VL)  │     │   Service       │
          │   Port 8000     │     │   Port 8080     │
          │                 │     │                 │
          └─────────────────┘     └─────────────────┘

Request Flow#

┌──────────┐    ┌─────────┐    ┌──────────┐    ┌─────────┐    ┌──────────┐
│  Staff   │    │ Gradio  │    │ FastAPI  │    │  VLM    │    │ Semantic │
│  Trigger │    │   UI    │    │   API    │    │ Client  │    │  Client  │
└────┬─────┘    └────┬────┘    └────┬─────┘    └────┬────┘    └────┬─────┘
     │               │              │               │              │
     │ Select Image  │              │               │              │
     │──────────────▶│              │               │              │
     │               │              │               │              │
     │ Click Validate│              │               │              │
     │──────────────▶│              │               │              │
     │               │              │               │              │
     │               │ POST /validate               │              │
     │               │─────────────▶│               │              │
     │               │              │               │              │
     │               │              │ Preprocess    │              │
     │               │              │ Image         │              │
     │               │              │───────────────│              │
     │               │              │               │              │
     │               │              │ analyze_plate()              │
     │               │              │──────────────▶│              │
     │               │              │               │              │
     │               │              │               │ OVMS POST    │
     │               │              │               │─────────────▶│
     │               │              │               │              │
     │               │              │               │◀─────────────│
     │               │              │               │ Detected Items
     │               │              │◀──────────────│              │
     │               │              │               │              │
     │               │              │ match_items()                │
     │               │              │─────────────────────────────▶│
     │               │              │                              │
     │               │              │◀─────────────────────────────│
     │               │              │            Similarity Scores │
     │               │              │               │              │
     │               │◀─────────────│               │              │
     │               │ Validation Result            │              │
     │◀──────────────│              │               │              │
     │ Display Results              │               │              │
     │               │              │               │              │

Docker Services#

Container

Image

Ports

Description

dinein_app

intel/order-accuracy-dine-in:2026.0.0

7861, 8083

Main application (Gradio + FastAPI)

dinein_ovms_vlm

openvino/model_server:latest-gpu

8002

Vision-Language Model server

dinein_semantic_service

intel/semantic-search-agent:1.0.0

8081, 9091

Semantic text matching

metrics-collector

intel/hl-ai-metrics-collector:1.0.0

8084

System metrics aggregation

Network Topology#

┌─────────────────────────────────────────────────────────────────┐
│                     Docker Network: dinein-net               │
│                                                                 │
│  ┌─────────────────┐    ┌─────────────────┐                    │
│  │   dinein_app    │    │ dinein_ovms_vlm │                    │
│  │                 │    │                 │                    │
│  │  - Gradio:7861  │───▶│  - REST: 8000   │  (internal)        │
│  │  - API:8083     │    │  - Host: 8002   │  (external)        │
│  │                 │    │                 │                    │
│  └────────┬────────┘    └─────────────────┘                    │
│           │                                                     │
│           │             ┌─────────────────┐                    │
│           │             │ semantic_service│                    │
│           └────────────▶│                 │                    │
│                         │  - REST: 8080   │  (internal)        │
│                         │  - Host: 8081   │  (external)        │
│                         └─────────────────┘                    │
│                                                                 │
│  ┌─────────────────┐                                           │
│  │metrics-collector│                                           │
│  │  - REST: 8084   │◀────── Prometheus-style metrics           │
│  └─────────────────┘                                           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
                    │
                    ▼ Host Network
        ┌───────────────────────┐
        │   localhost:7861      │  ← Gradio UI
        │   localhost:8083      │  ← REST API
        │   localhost:8083/docs │  ← Swagger Docs
        │   localhost:8002      │  ← OVMS VLM
        │   localhost:8084      │  ← Metrics API
        └───────────────────────┘

Component Details#

1. VLM Client (vlm_client.py)#

The VLM Client handles communication with OpenVINO Model Server for visual inference.

Features:

  • Image Preprocessing: Smart resizing (672px max), JPEG compression (82% quality), contrast enhancement

  • Circuit Breaker: 5 failures → OPEN, 30s recovery → HALF_OPEN, 2 successes → CLOSED

  • Connection Pooling: Shared httpx.AsyncClient with HTTP/2, 50 max connections

  • Inventory-Aware Prompts: Includes known menu items for improved accuracy

# Circuit Breaker States
class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing, reject requests
    HALF_OPEN = "half_open"  # Testing recovery

2. Semantic Client (semantic_client.py)#

Handles fuzzy string matching for item comparison.

Features:

  • Similarity Threshold: Default 0.7 (70% match required)

  • Fallback Matching: Exact string match when service unavailable

  • Circuit Breaker: 15s recovery timeout (faster than VLM)

  • Connection Pool: Shared client with 20 max connections

3. Validation Service (validation_service.py)#

Orchestrates the validation workflow using Strategy pattern.

Validation Pipeline:

  1. VLM inference → detected items

  2. Semantic matching → item correlations

  3. Quantity analysis → mismatches

  4. Accuracy calculation → final score

# Accuracy Calculation
accuracy = matched_items / max(expected_items, detected_items)
order_complete = (missing == 0) and (quantity_errors == 0) and (extra == 0)

4. Configuration Manager (config.py)#

Thread-safe singleton for application configuration.

Features:

  • Double-checked locking pattern

  • Environment variable driven

  • Runtime benchmark mode toggle

5. API Layer (api.py)#

FastAPI endpoints with bounded validation cache.

Features:

  • BoundedValidationCache: LRU eviction, 10K max entries

  • Thread-safe service init: Lock-protected lazy initialization

  • Async metrics collection: Non-blocking system stats


Data Flow#

Validation Request Processing#

  1. Image Processing: Raw Image → Auto-Orient → Resize (672px) → Enhance → Sharpen → JPEG Compress (82%) → Base64 Encode

  2. VLM Inference: Prompt: “Analyze this food plate image…” + Inventory list for context → OVMS POST /v3/chat/completions → Parse JSON response for detected items

  3. Semantic Matching: For each expected item:

    • Find best match in detected items (similarity > 0.7)

    • Track: matched, missing, extra, quantity mismatches

  4. Result Aggregation:

    {
      "order_complete": true/false,
      "accuracy_score": 0.0-1.0,
      "missing_items": [...],
      "extra_items": [...],
      "metrics": { "latency": [...], "tps": [...], "utilization": [...] }
    }
    

Metrics Collection#

┌─────────────────────────────────────────────────────────────────────┐
│                        METRICS PIPELINE                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  VLM CLIENT                    METRICS COLLECTOR                   │
│  ┌────────────────┐           ┌────────────────┐                   │
│  │ log_start_time │──────────▶│ Start Timestamp│                   │
│  │ log_end_time   │──────────▶│ End Timestamp  │                   │
│  │ log_custom_event           │ TPS, Tokens    │                   │
│  │   - tps        │──────────▶│ Preprocess Time│                   │
│  │   - tokens     │           │ Items Detected │                   │
│  │   - latency    │           └────────┬───────┘                   │
│  └────────────────┘                    │                           │
│                                        ▼                           │
│                              ┌────────────────┐                    │
│                              │ JSON/CSV Export│                    │
│                              │ results/*.json │                    │
│                              │ results/*.csv  │                    │
│                              └────────────────┘                    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Production Features#

Circuit Breaker Pattern#

Prevents cascading failures when external services are unhealthy.

        flowchart LR
    CLOSED["CLOSED"]
    OPEN["OPEN"]
    HALFOPEN["HALF-OPEN"]

    CLOSED -- "5 consecutive failures" --> OPEN
    OPEN -- "30s timeout" --> HALFOPEN
    HALFOPEN -- "2 successes" --> CLOSED
    HALFOPEN -- "1 failure" --> OPEN
    

Connection Pooling#

# VLM Client Pool Configuration
limits = httpx.Limits(
    max_keepalive_connections=20,
    max_connections=50,
    keepalive_expiry=30.0
)
timeout = httpx.Timeout(
    connect=10.0,
    read=300.0,   # Extended for VLM inference
    write=10.0,
    pool=10.0
)
client = httpx.AsyncClient(limits=limits, timeout=timeout, http2=True)

Bounded Cache (LRU)#

class BoundedValidationCache:
    """Thread-safe LRU cache with automatic eviction"""

    def __init__(self, maxsize: int = 10000):
        self._cache = OrderedDict()
        self._maxsize = maxsize
        self._lock = threading.Lock()

    def __setitem__(self, key, value):
        with self._lock:
            if key in self._cache:
                self._cache.move_to_end(key)
            self._cache[key] = value
            # Evict oldest when full
            while len(self._cache) > self._maxsize:
                self._cache.popitem(last=False)

Performance Characteristics#

Latency Breakdown#

Stage

Typical Duration

Image Preprocessing

50–100 ms

VLM Inference

8–12 s

Semantic Matching

20–50 ms

Total E2E

9–15 s

Target: < 15 s end-to-end for operational efficiency.


System Requirements#

See the System Requirements for detailed hardware, software, and network prerequisites.


Pre-Deployment Checklist#

  • [ ] Docker and Docker Compose installed and working

  • [ ] Intel GPU drivers installed and GPU visible to Docker

  • [ ] Required ports available (7861, 8083, 8002, 8081, 8084)

  • [ ] At least 50 GB free disk space

  • [ ] VLM model downloaded (setup_models.sh completed)

  • [ ] .env file created (make init-env)

  • [ ] Plate images placed in images/ and configs/orders.json updated