How to Use#
Guide to using the Dine-In Order Accuracy application features.
Note —
TARGET_DEVICE: To change the inference device, setTARGET_DEVICEin.envtoGPU,CPU, orNPU, then re-run setup:cd ../ovms-service && ./setup_models.sh --app dine-in && cd ../dine-in make down && make up
Gradio UI#
Access the web interface at http://localhost:7861.
Interface Overview#
Note — negative test case: The default MCD-1001 scenario in the Gradio UI intentionally submits a mismatched order (Cheeseburger / French Fries) against a tray image that contains Filet-O-Fish and Cheesy Fries. This demonstrates the application’s ability to detect an incorrect order. The result will show
order_complete: ✗. To see a successful validation, select another scenario or update the order to match the tray.
┌─────────────────────────────────────────────────────────────┐
│ Dine-In Order Accuracy Benchmark │
├─────────────────────────────────────────────────────────────┤
│ │
│ Scenario: [MCD-1001 – McDonald's Table T12 ▼] │
│ │
│ ┌─────────────────────┐ ┌─────────────────────────────┐ │
│ │ │ │ Order Manifest │ │
│ │ [Plate Image] │ │ ───────────────── │ │
│ │ │ │ items_ordered: │ │
│ │ │ │ - Cheeseburger │ │
│ │ │ │ - French Fries │ │
│ │ │ │ │ │
│ └─────────────────────┘ └─────────────────────────────┘ │
│ │
│ [Validate Plate] │
│ │
│ ┌─────────────────────┐ ┌─────────────────────────────┐ │
│ │ Validation Result │ │ Performance Metrics │ │
│ │ ───────────────── │ │ ─────────────────── │ │
│ │ order_complete: ✗ │ │ vlm_inference_ms: 9003 │ │
│ │ accuracy_score: 0.0 │ │ cpu_utilization: 27% │ │
│ │ missing_items: [..] │ │ gpu_utilization: 100% │ │
│ │ extra_items: [...] │ │ memory_utilization: 80% │ │
│ └─────────────────────┘ └─────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Usage Steps#
Select Scenario: Choose a test scenario from the dropdown
Review Order: Verify the order manifest on the right
Validate: Click “Validate Plate” button
Review Results: Check validation outcome and metrics
REST API#
Validate Single Image#
The application supports two complementary test scenarios with the bundled MCD-1001.png
sample image (which depicts Filet-O-Fish and Cheesy Fries on the tray):
Negative test case — order does not match tray contents#
This example intentionally submits an order that does not match the items visible in
MCD-1001.png. It demonstrates how the application detects a mismatch and reports missing
and extra items. This is a valid production scenario: a customer receives the wrong items.
curl -X POST "http://localhost:8083/api/validate" \
-F "image=@images/MCD-1001.png" \
-F 'order={
"order_id": "MCD-1001",
"table_number": "T12",
"restaurant": "McDonald'\''s",
"items": [
{"name": "Cheeseburger", "quantity": 1},
{"name": "French Fries", "quantity": 1}
]
}'
Expected response (order_complete: false — tray has Filet-O-Fish/Cheesy Fries, not Cheeseburger/French Fries):
{
"validation_id": "939d830a-8335-4fea-a564-0b83b93b71ab",
"image_id": "MCD-1001",
"order_complete": false,
"accuracy_score": 0.0,
"missing_items": [
{ "name": "Cheeseburger", "quantity": 1 },
{ "name": "French Fries", "quantity": 1 }
],
"extra_items": [
{ "name": "Filet-O-Fish", "quantity": 1 },
{ "name": "Cheesy Fries", "quantity": 1 }
],
"quantity_mismatches": [],
"matched_items": [],
"timestamp": "2026-03-20T14:53:05.543315",
"metrics": {
"end_to_end_latency_ms": 21854,
"vlm_inference_ms": 21747,
"agent_reconciliation_ms": 32,
"cpu_utilization": 23.21,
"gpu_utilization": 0.0,
"memory_utilization": 78.07
}
}
Positive test case — order matches tray contents#
This example submits the correct items for MCD-1001.png and demonstrates a successful
order validation (order_complete: true).
curl -X POST "http://localhost:8083/api/validate" \
-F "image=@images/MCD-1001.png" \
-F 'order={
"order_id": "MCD-1001",
"table_number": "T12",
"restaurant": "McDonald'\''s",
"items": [
{"name": "Filet-O-Fish", "quantity": 1},
{"name": "Cheesy Fries", "quantity": 1}
]
}'
Expected response (order_complete: true — tray contents match the submitted order):
{
"validation_id": "c459c9e5-3b48-462a-8c09-6360d4fd76fa",
"image_id": "MCD-1001",
"order_complete": true,
"accuracy_score": 1.0,
"missing_items": [],
"extra_items": [],
"quantity_mismatches": [],
"matched_items": [
{
"expected_name": "Filet-O-Fish",
"detected_name": "Filet-O-Fish",
"similarity": 1.0,
"quantity": 1
},
{
"expected_name": "Cheesy Fries",
"detected_name": "Cheesy Fries",
"similarity": 1.0,
"quantity": 1
}
],
"timestamp": "2026-03-20T14:55:31.025273",
"metrics": {
"end_to_end_latency_ms": 14475,
"vlm_inference_ms": 14438,
"agent_reconciliation_ms": 6,
"cpu_utilization": 16.02,
"gpu_utilization": 0.0,
"memory_utilization": 78.27
}
}
Get Validation by ID#
curl "http://localhost:8083/api/validate/26eba3f8-276b-44ac-b553-74419f84c1ad"
List All Validations#
curl "http://localhost:8083/api/validate"
Health Check#
curl "http://localhost:8083/health"
Benchmarking#
Prerequisites#
Before running benchmarks, initialize the performance-tools submodule:
make update-submodules
Optionally build the benchmark Docker image:
make build-benchmark
Or fetch from registry (if REGISTRY=true):
make fetch-benchmark
Quick Single Image Test#
For a quick validation test with curl:
Prerequisite: Services must be running. Start them first with
make up.
# IMAGE_ID must match an entry in configs/orders.json
# Available IDs: MCD-1001, MCD-1002, MCD-1003, MCD-1004
make benchmark-single IMAGE_ID=MCD-1001
Output:
=== Benchmark Results ===
{
"validation_id": "...",
"accuracy_score": 0.5,
"metrics": {
"vlm_inference_ms": 9003,
"gpu_utilization": 100.0
}
}
Full Benchmark#
Run the Order Accuracy benchmark using benchmark_order_accuracy.py:
make benchmark
Configuration options:
Variable |
Default |
Description |
|---|---|---|
|
1 |
Number of concurrent workers |
|
180 |
Benchmark duration (seconds) |
|
25000 |
Target latency threshold (ms) |
|
avg |
Metric: |
|
1 |
Concurrent images per iteration |
|
60 |
Warmup time (seconds) |
|
3 |
Min requests before measuring |
|
300 |
Request timeout (seconds) |
|
GPU |
Target device: CPU, GPU, NPU |
|
results |
Output directory |
|
false |
Use registry images (true/false) |
Example:
make benchmark BENCHMARK_WORKERS=2 BENCHMARK_DURATION=600 TARGET_DEVICE=GPU
Stream Density Test#
Tests maximum concurrent validations within latency target.
make benchmark-stream-density
Output:
Target Latency: 15000ms
Max Density: 2 concurrent images
Iteration 1: 1 image → 11726ms ✓ PASSED
Iteration 2: 2 images → 14808ms ✓ PASSED
Iteration 3: 3 images → 19509ms ✗ FAILED
Metrics Processing#
After running benchmarks, consolidate and visualize metrics:
# Consolidate metrics from multiple runs to CSV
make consolidate-metrics
# Generate plots from benchmark metrics
make plot-metrics
Understanding Results#
Validation Status#
Field |
Description |
|---|---|
|
|
|
0.0-1.0 ratio of matched to expected items |
|
Items in order but not detected on plate |
|
Items detected but not in order |
|
Items with wrong quantities |
|
Successfully matched items with similarity scores |
Metrics Interpretation#
Metric |
Good Value |
Warning |
|---|---|---|
|
< 10,000 |
> 15,000 |
|
80-100% |
< 50% (not using GPU) |
|
20-40% |
> 80% |
|
< 80% |
> 90% |
Adding Custom Test Scenarios#
1. Add Image#
Place image in images/ directory:
cp my_plate.jpg images/
2. Update Orders Config#
Edit configs/orders.json:
{
"orders": [
{
"image_id": "my_plate",
"restaurant": "My Restaurant",
"table_number": "5",
"items_ordered": [
{ "item": "Burger", "quantity": 1 },
{ "item": "Fries", "quantity": 1 }
]
}
]
}
3. Restart Application#
make down && make up
The new scenario appears in the Gradio dropdown.