Performance Testing & Benchmarking#

Test your Order Accuracy pipeline performance on various hardware configurations. This guide covers everything from quick performance checks to comprehensive system capacity testing.

Quick Start (5 minutes)#

Goal: Run a basic performance test to verify your system works correctly

1. Initialize Performance Tools#

make update-submodules

2. Run Quick Benchmark#

cd dine-in
make benchmark
cd take-away
make benchmark

What this does:

  • Tests GPU/CPU performance for order validation

  • Measures end-to-end latency

  • Generates performance metrics

  • Outputs results to results/ directory

Understanding Benchmark Types#

make benchmark

Tests single image validation latency:

  • Image preprocessing time

  • VLM inference time

  • Semantic matching time

  • Total end-to-end latency

make benchmark-density

Finds maximum concurrent requests the system can handle under latency constraints:

  • Target latency threshold (configurable)

  • Progressive load increase

  • Identifies performance ceiling

make benchmark

Tests end-to-end latency for single order validation:

  • Video upload time

  • Frame extraction time

  • VLM inference latency

  • Validation time

  • Total processing time

make benchmark-oa BENCHMARK_WORKERS=4 BENCHMARK_DURATION=300

Tests system with fixed number of concurrent workers:

  • Throughput (orders/minute)

  • Latency percentiles (P50, P95, P99)

  • GPU utilization

  • Memory usage

make benchmark-stream-density

Finds maximum sustainable worker count under latency constraints:

  • Maximum concurrent workers

  • Latency at each worker count

  • Point of degradation

  • Resource utilization at capacity

Environment Variables Reference#

Variable

Default

Description

TARGET_LATENCY_MS

15000

Target latency threshold (ms)

LATENCY_METRIC

avg

‘avg’, ‘p95’, or ‘max’

DENSITY_INCREMENT

1

Concurrent images per iteration

INIT_DURATION

60

Warmup time (seconds)

MIN_REQUESTS

3

Min requests before measuring

REQUEST_TIMEOUT

300

Individual request timeout (seconds)

API_ENDPOINT

http://localhost:8083

API endpoint URL

RESULTS_DIR

./results

Results output directory

Variable

Default

Description

TARGET_LATENCY_MS

25000

Target latency threshold (ms)

LATENCY_METRIC

avg

‘avg’, ‘p95’, or ‘max’

WORKER_INCREMENT

1

Workers added per iteration

INIT_DURATION

10

Warmup time (seconds)

MIN_TRANSACTIONS

3

Min transactions before measuring

MAX_ITERATIONS

50

Max scaling iterations

MAX_WAIT_SEC

600

Max wait per iteration (seconds)

BENCHMARK_WORKERS

1

Number of workers (fixed mode)

BENCHMARK_DURATION

60

Test duration (seconds)

Hardware Testing Commands#

GPU Performance Testing#

# Ensure GPU device is configured in .env
# OPENVINO_DEVICE=GPU
make benchmark
# Configure GPU in .env
# OPENVINO_DEVICE=GPU
make benchmark-oa BENCHMARK_WORKERS=4

Multi-Worker Stress Testing (Take-Away)#

# Test with 2 parallel workers
make up-parallel WORKERS=2
make benchmark-oa BENCHMARK_WORKERS=2

# High stress test with 8 workers
make up-parallel WORKERS=8
make benchmark-oa BENCHMARK_WORKERS=8

Progressive Load Testing#

# Automatically find maximum sustainable workers
make benchmark-stream-density \
  BENCHMARK_TARGET_LATENCY_MS=25000 \
  BENCHMARK_WORKER_INCREMENT=1 \
  BENCHMARK_MAX_ITERATIONS=20

Viewing Results#

# View density benchmark results
make benchmark-density-results

# View raw results
cat results/benchmark_results.json
ls -la results/
# View benchmark results
make benchmark-oa-results

# View density results
cat results/stream_density_results.json
ls -la results/

Consolidate Metrics#

make consolidate-metrics
cat results/metrics_summary.csv

Expected Performance#

Typical Latency Ranges#

Operation

Dine-In

Take-Away

Image Preprocessing

100-500ms

N/A

Frame Selection

N/A

200-500ms

VLM Inference

5-10s

5-10s

Semantic Matching

50-200ms

50-200ms

Total End-to-End

8-15s

8-15s per order

Hardware Impact#

Configuration

Typical Performance

CPU Only

15-25s per validation

Intel iGPU

8-15s per validation

Intel Arc dGPU

5-10s per validation

NVIDIA RTX

4-8s per validation

Throughput Expectations#

Mode

Expected Throughput

Dine-In Single

4-6 orders/minute

Take-Away Single

4-6 orders/minute

Take-Away Parallel (4 workers)

16-24 orders/minute

Take-Away Parallel (8 workers)

30-40 orders/minute

Optimization Tips#

GPU Utilization#

  • Monitor GPU usage with nvidia-smi -l 1 or intel_gpu_top

  • Target 70-90% GPU utilization for optimal throughput

  • If GPU is underutilized, increase worker count

Memory Management#

  • Monitor container memory with docker stats

  • VLM models require 8-16GB GPU memory

  • Reduce batch size if out-of-memory errors occur

Network Optimization (Take-Away)#

  • Use wired connections for RTSP streams

  • Ensure 1Gbps+ network bandwidth per camera

  • Consider local video storage for testing

Latency Reduction#

  • Use INT8 model quantization

  • Enable HTTP/2 for API connections

  • Pre-warm VLM model before benchmarking

Troubleshooting Performance Issues#

Low FPS / High Latency#

  • Check GPU driver installation

  • Verify OPENVINO_DEVICE setting in .env

  • Reduce image resolution or batch size

  • Check for thermal throttling

VLM Timeout Errors#

  • Increase API_TIMEOUT in .env

  • Check GPU memory availability

  • Consider using smaller model precision

Memory Exhaustion#

  • Reduce number of parallel workers

  • Lower batch size settings

  • Monitor with docker stats

Inconsistent Results#

  • Increase warmup duration (INIT_DURATION)

  • Increase minimum transactions (MIN_TRANSACTIONS)

  • Run multiple benchmark iterations