Performance Testing & Benchmarking#

Test your Order Accuracy pipeline performance on various hardware configurations. This guide covers everything from quick performance checks to comprehensive system capacity testing.

Quick Start (5 minutes)#

Goal: Run a basic performance test to verify your system works correctly

1. Initialize Performance Tools#

make update-submodules

2. Run Quick Benchmark#

Dine-In

cd dine-in
make benchmark

Take-Away

cd take-away
make benchmark

What this does:

Tests GPU/CPU performance for order validation
Measures end-to-end latency
Generates performance metrics
Outputs results to results/ directory

Understanding Benchmark Types#

Dine-In Benchmarks

Single Request Benchmark

make benchmark

Tests single image validation latency:

Image preprocessing time
VLM inference time
Semantic matching time
Total end-to-end latency

Stream Density Benchmark

make benchmark-density

Finds maximum concurrent requests the system can handle under latency constraints:

Target latency threshold (configurable)
Progressive load increase
Identifies performance ceiling

Take-Away Benchmarks

Single Video Benchmark

make benchmark

Tests end-to-end latency for single order validation:

Video upload time
Frame extraction time
VLM inference latency
Validation time
Total processing time

Fixed Workers Benchmark

make benchmark-oa BENCHMARK_WORKERS=4 BENCHMARK_DURATION=300

Tests system with fixed number of concurrent workers:

Throughput (orders/minute)
Latency percentiles (P50, P95, P99)
GPU utilization
Memory usage

Stream Density Benchmark

make benchmark-stream-density

Finds maximum sustainable worker count under latency constraints:

Maximum concurrent workers
Latency at each worker count
Point of degradation
Resource utilization at capacity

Environment Variables Reference#

Dine-In Configuration

Variable	Default	Description
`TARGET_LATENCY_MS`	15000	Target latency threshold (ms)
`LATENCY_METRIC`	avg	‘avg’, ‘p95’, or ‘max’
`DENSITY_INCREMENT`	1	Concurrent images per iteration
`INIT_DURATION`	60	Warmup time (seconds)
`MIN_REQUESTS`	3	Min requests before measuring
`REQUEST_TIMEOUT`	300	Individual request timeout (seconds)
`API_ENDPOINT`	`http://localhost:8083`	API endpoint URL
`RESULTS_DIR`	`./results`	Results output directory

Take-Away Configuration

Variable	Default	Description
`TARGET_LATENCY_MS`	25000	Target latency threshold (ms)
`LATENCY_METRIC`	avg	‘avg’, ‘p95’, or ‘max’
`WORKER_INCREMENT`	1	Workers added per iteration
`INIT_DURATION`	10	Warmup time (seconds)
`MIN_TRANSACTIONS`	3	Min transactions before measuring
`MAX_ITERATIONS`	50	Max scaling iterations
`MAX_WAIT_SEC`	600	Max wait per iteration (seconds)
`BENCHMARK_WORKERS`	1	Number of workers (fixed mode)
`BENCHMARK_DURATION`	60	Test duration (seconds)

Hardware Testing Commands#

GPU Performance Testing#

Dine-In

# Ensure GPU device is configured in .env
# OPENVINO_DEVICE=GPU
make benchmark

Take-Away

# Configure GPU in .env
# OPENVINO_DEVICE=GPU
make benchmark-oa BENCHMARK_WORKERS=4

Multi-Worker Stress Testing (Take-Away)#

# Test with 2 parallel workers
make up-parallel WORKERS=2
make benchmark-oa BENCHMARK_WORKERS=2

# High stress test with 8 workers
make up-parallel WORKERS=8
make benchmark-oa BENCHMARK_WORKERS=8

Progressive Load Testing#

# Automatically find maximum sustainable workers
make benchmark-stream-density \
  BENCHMARK_TARGET_LATENCY_MS=25000 \
  BENCHMARK_WORKER_INCREMENT=1 \
  BENCHMARK_MAX_ITERATIONS=20

Viewing Results#

Dine-In Results

# View density benchmark results
make benchmark-density-results

# View raw results
cat results/benchmark_results.json
ls -la results/

Take-Away Results

# View benchmark results
make benchmark-oa-results

# View density results
cat results/stream_density_results.json
ls -la results/

Consolidate Metrics#

make consolidate-metrics
cat results/metrics_summary.csv

Expected Performance#

Typical Latency Ranges#

Operation	Dine-In	Take-Away
Image Preprocessing	100-500ms	N/A
Frame Selection	N/A	200-500ms
VLM Inference	5-10s	5-10s
Semantic Matching	50-200ms	50-200ms
Total End-to-End	8-15s	8-15s per order

Hardware Impact#

Configuration	Typical Performance
CPU Only	15-25s per validation
Intel iGPU	8-15s per validation
Intel Arc dGPU	5-10s per validation
NVIDIA RTX	4-8s per validation

Throughput Expectations#

Mode	Expected Throughput
Dine-In Single	4-6 orders/minute
Take-Away Single	4-6 orders/minute
Take-Away Parallel (4 workers)	16-24 orders/minute
Take-Away Parallel (8 workers)	30-40 orders/minute

Optimization Tips#

GPU Utilization#

Monitor GPU usage with nvidia-smi -l 1 or intel_gpu_top
Target 70-90% GPU utilization for optimal throughput
If GPU is underutilized, increase worker count

Memory Management#

Monitor container memory with docker stats
VLM models require 8-16GB GPU memory
Reduce batch size if out-of-memory errors occur

Network Optimization (Take-Away)#

Use wired connections for RTSP streams
Ensure 1Gbps+ network bandwidth per camera
Consider local video storage for testing

Latency Reduction#

Use INT8 model quantization
Enable HTTP/2 for API connections
Pre-warm VLM model before benchmarking

Troubleshooting Performance Issues#

Low FPS / High Latency#

Check GPU driver installation
Verify OPENVINO_DEVICE setting in .env
Reduce image resolution or batch size
Check for thermal throttling

VLM Timeout Errors#

Increase API_TIMEOUT in .env
Check GPU memory availability
Consider using smaller model precision

Memory Exhaustion#

Reduce number of parallel workers
Lower batch size settings
Monitor with docker stats

Inconsistent Results#

Increase warmup duration (INIT_DURATION)
Increase minimum transactions (MIN_TRANSACTIONS)
Run multiple benchmark iterations