# Benchmarking Guide — Take-Away Order Accuracy

This guide covers performance testing, stream density benchmarking, and metrics collection for the Take-Away Order Accuracy system.

> **Note — Inference Device**: The default device is `GPU`. To switch to a different device (`CPU` or `NPU`), you must do **both** steps below, otherwise the model will be exported for the wrong device:
>
> 1. Set **both** variables in your `.env` file:
>
>    ```bash
>    TARGET_DEVICE=GPU      # used by setup_models.sh and docker-compose
>    OPENVINO_DEVICE=GPU    # used by the Makefile benchmark targets
>    ```
>
> 2. Re-export the model for the new device:
>
>    ```bash
>    cd ../ovms-service && ./setup_models.sh --app take-away
>    ```
>
> `TARGET_DEVICE` is what `setup_models.sh` reads to export the model in the correct format. `OPENVINO_DEVICE` is what the Makefile passes to the benchmark script. Both must match.

> **Important**: Before running benchmarks, ensure a test video file is present at `storage/videos/test.mp4`. You can download a sample video using:
>
> ```bash
> make download-sample-video
> ```

---

## Quick Reference

```bash
# First-time setup
make update-submodules        # Initialize performance-tools submodule
make up                       # Start all services

# Benchmarks
make benchmark                           # Fixed-workers benchmark (default config)
make benchmark-oa BENCHMARK_WORKERS=4   # Fixed-workers with custom worker count
make benchmark-stream-density            # Stream density benchmark

# View results
make benchmark-oa-metrics     # View VLM metrics
make benchmark-oa-results     # View all result files
make consolidate-metrics      # Consolidate metrics to CSV
make plot-metrics             # Generate plots

# Cleanup
make clean-results            # Remove results files
make clean                    # Stop containers and remove volumes

# Help
make benchmark-oa-help
make help
```

---

## Prerequisites

```bash
# 1. Initialize git submodules (first time only)
make update-submodules

# 2. Start services
make up
```

---

## Benchmark Commands

### Fixed Workers Benchmark

Runs `benchmark_order_accuracy.py` with a fixed number of concurrent workers.

```bash
# Default run
make benchmark

# Custom run
make benchmark \
  BENCHMARK_WORKERS=4 \
  BENCHMARK_DURATION=300 \
  BENCHMARK_INIT_DURATION=30
```

**Variables:**

| Variable                  | Default | Description                                                                                                             |
| ------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------- |
| `BENCHMARK_WORKERS`       | `1`     | Number of concurrent workers                                                                                            |
| `BENCHMARK_DURATION`      | `200`   | Test duration (seconds)                                                                                                 |
| `BENCHMARK_INIT_DURATION` | `10`    | Warmup time (seconds)                                                                                                   |
| `OPENVINO_DEVICE`         | `GPU`   | Inference device (`GPU`, `CPU`). Must also set `TARGET_DEVICE` in `.env` and re-run `setup_models.sh` — see note above. |

---

### Stream Density Benchmark

Finds the maximum number of concurrent workers the system can sustain under a target latency threshold. Runs `stream_density_latency_oa.py`.

```bash
# Default run
make benchmark-stream-density

# Custom run
make benchmark-stream-density \
  BENCHMARK_TARGET_LATENCY_MS=25000 \
  BENCHMARK_LATENCY_METRIC=avg \
  BENCHMARK_INIT_DURATION=30 \
  BENCHMARK_MIN_TRANSACTIONS=3 \
  BENCHMARK_WORKER_INCREMENT=1
```

**Variables:**

| Variable                      | Default | Description                                            |
| ----------------------------- | ------- | ------------------------------------------------------ |
| `BENCHMARK_TARGET_LATENCY_MS` | `25000` | Target latency threshold (ms)                          |
| `BENCHMARK_LATENCY_METRIC`    | `avg`   | Metric to evaluate: `avg` or `p95`                     |
| `BENCHMARK_WORKER_INCREMENT`  | `1`     | Workers added per iteration                            |
| `BENCHMARK_INIT_DURATION`     | `10`    | Warmup time per iteration (seconds)                    |
| `BENCHMARK_MIN_TRANSACTIONS`  | `1`     | Min transactions before measuring latency              |
| `OOM_PROTECTION`              | `1`     | Set to `0` to disable OOM protection (not recommended) |

---

## Results & Metrics

Results are saved to the `results/` directory:

```text
results/
├── vlm_application_metrics_*.txt    # VLM application metrics
├── vlm_performance_metrics_*.txt    # VLM performance metrics
└── consolidated_metrics.csv         # Generated by make consolidate-metrics
```

```bash
# View VLM metrics
make benchmark-oa-metrics

# View all result files
make benchmark-oa-results

# Consolidate metrics from multiple runs into a single CSV
make consolidate-metrics

# Generate plots from consolidated metrics
make plot-metrics
```