Benchmarking Guide — Take-Away Order Accuracy#

This guide covers performance testing, stream density benchmarking, and metrics collection for the Take-Away Order Accuracy system.

Note — Inference Device: The default device is GPU. To switch to a different device (CPU or NPU), you must do both steps below, otherwise the model will be exported for the wrong device:

  1. Set both variables in your .env file:

    TARGET_DEVICE=GPU      # used by setup_models.sh and docker-compose
    OPENVINO_DEVICE=GPU    # used by the Makefile benchmark targets
    
  2. Re-export the model for the new device:

    cd ../ovms-service && ./setup_models.sh --app take-away
    

TARGET_DEVICE is what setup_models.sh reads to export the model in the correct format. OPENVINO_DEVICE is what the Makefile passes to the benchmark script. Both must match.

Important: Before running benchmarks, ensure a test video file is present at storage/videos/test.mp4. You can download a sample video using:

make download-sample-video

Quick Reference#

# First-time setup
make update-submodules        # Initialize performance-tools submodule
make up                       # Start all services

# Benchmarks
make benchmark                           # Fixed-workers benchmark (default config)
make benchmark-oa BENCHMARK_WORKERS=4   # Fixed-workers with custom worker count
make benchmark-stream-density            # Stream density benchmark

# View results
make benchmark-oa-metrics     # View VLM metrics
make benchmark-oa-results     # View all result files
make consolidate-metrics      # Consolidate metrics to CSV
make plot-metrics             # Generate plots

# Cleanup
make clean-results            # Remove results files
make clean                    # Stop containers and remove volumes

# Help
make benchmark-oa-help
make help

Prerequisites#

# 1. Initialize git submodules (first time only)
make update-submodules

# 2. Start services
make up

Benchmark Commands#

Fixed Workers Benchmark#

Runs benchmark_order_accuracy.py with a fixed number of concurrent workers.

# Default run
make benchmark

# Custom run
make benchmark \
  BENCHMARK_WORKERS=4 \
  BENCHMARK_DURATION=300 \
  BENCHMARK_INIT_DURATION=30

Variables:

Variable

Default

Description

BENCHMARK_WORKERS

1

Number of concurrent workers

BENCHMARK_DURATION

200

Test duration (seconds)

BENCHMARK_INIT_DURATION

10

Warmup time (seconds)

OPENVINO_DEVICE

GPU

Inference device (GPU, CPU). Must also set TARGET_DEVICE in .env and re-run setup_models.sh — see note above.


Stream Density Benchmark#

Finds the maximum number of concurrent workers the system can sustain under a target latency threshold. Runs stream_density_latency_oa.py.

# Default run
make benchmark-stream-density

# Custom run
make benchmark-stream-density \
  BENCHMARK_TARGET_LATENCY_MS=25000 \
  BENCHMARK_LATENCY_METRIC=avg \
  BENCHMARK_INIT_DURATION=30 \
  BENCHMARK_MIN_TRANSACTIONS=3 \
  BENCHMARK_WORKER_INCREMENT=1

Variables:

Variable

Default

Description

BENCHMARK_TARGET_LATENCY_MS

25000

Target latency threshold (ms)

BENCHMARK_LATENCY_METRIC

avg

Metric to evaluate: avg or p95

BENCHMARK_WORKER_INCREMENT

1

Workers added per iteration

BENCHMARK_INIT_DURATION

10

Warmup time per iteration (seconds)

BENCHMARK_MIN_TRANSACTIONS

1

Min transactions before measuring latency

OOM_PROTECTION

1

Set to 0 to disable OOM protection (not recommended)


Results & Metrics#

Results are saved to the results/ directory:

results/
├── vlm_application_metrics_*.txt    # VLM application metrics
├── vlm_performance_metrics_*.txt    # VLM performance metrics
└── consolidated_metrics.csv         # Generated by make consolidate-metrics
# View VLM metrics
make benchmark-oa-metrics

# View all result files
make benchmark-oa-results

# Consolidate metrics from multiple runs into a single CSV
make consolidate-metrics

# Generate plots from consolidated metrics
make plot-metrics