Benchmarking Guide — Take-Away Order Accuracy#

This guide covers performance testing, stream density benchmarking, and metrics collection for the Take-Away Order Accuracy system.

Note — Inference Device: The default device is GPU. To switch to CPU, you must do both steps below, otherwise the model will be exported for the wrong device:
Set both variables in your .env file:
TARGET_DEVICE=GPU      # used by setup_models.sh and docker-compose
OPENVINO_DEVICE=GPU    # used by the Makefile benchmark targets
Re-export the model for the new device:
cd ../ovms-service && ./setup_models.sh --app take-away
TARGET_DEVICE is what setup_models.sh reads to export the model in the correct format. OPENVINO_DEVICE is what the Makefile passes to the benchmark script. Both must match.

Important: Before running benchmarks, ensure a test video file is present at storage/videos/test.mp4. You can download a sample video using:
make download-sample-video

Quick Reference#

# First-time setup
make update-submodules        # Initialize performance-tools submodule
make up                       # Start all services

# Benchmarks
make benchmark                           # Fixed-workers benchmark (default config)
make benchmark-oa BENCHMARK_WORKERS=4   # Fixed-workers with custom worker count
make benchmark-stream-density            # Stream density benchmark

# View results
make benchmark-oa-metrics     # View VLM metrics
make benchmark-oa-results     # View all result files
make consolidate-metrics      # Consolidate metrics to CSV
make plot-metrics             # Generate plots

# Cleanup
make clean-results            # Remove results files
make clean                    # Stop containers and remove volumes

# Help
make benchmark-oa-help
make help

Prerequisites#

# 1. Initialize git submodules (first time only)
make update-submodules

# 2. Start services
make up

Benchmark Commands#

Fixed Workers Benchmark#

Runs benchmark_order_accuracy.py with a fixed number of concurrent workers.

# Default run
make benchmark

# Custom run
make benchmark \
  BENCHMARK_WORKERS=4 \
  BENCHMARK_DURATION=300 \
  BENCHMARK_INIT_DURATION=30

Variables:

Variable	Default	Description
`BENCHMARK_WORKERS`	`1`	Number of concurrent workers
`BENCHMARK_DURATION`	`200`	Test duration (seconds)
`BENCHMARK_INIT_DURATION`	`10`	Warmup time (seconds)
`OPENVINO_DEVICE`	`GPU`	Inference device (`GPU`, `CPU`). Must also set `TARGET_DEVICE` in `.env` and re-run `setup_models.sh` — see note above.

Stream Density Benchmark#

Finds the maximum number of concurrent workers the system can sustain under a target latency threshold. Runs stream_density_latency_oa.py.

# Default run
make benchmark-stream-density

# Custom run
make benchmark-stream-density \
  BENCHMARK_TARGET_LATENCY_MS=25000 \
  BENCHMARK_LATENCY_METRIC=avg \
  BENCHMARK_INIT_DURATION=30 \
  BENCHMARK_MIN_TRANSACTIONS=3 \
  BENCHMARK_WORKER_INCREMENT=1