# Get Started with Intermediate Fusion - Training ## Overview This repo is the **training + ONNX-conversion** side for the deploy pipelines in [Intermediate Fusion/Deploy](https://github.com/open-edge-platform/edge-ai-suites/tree/main/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/deploy). Two parallel deploy pipelines are supported; pick the one matching your deploy target: | Pipeline | Deploy binary | Lidar encoder | ONNX artifacts | Custom ops (and where they live) | Deploy directories | |---|---|---|---|---|---| | **A — PointPillars-based** (§4) | `./bevfusion` | PillarFeatureNet + PointPillarsScatter | **4 independent ONNX**: `camera.backbone` / `lidar_pfe` / `fuser` / `head` | `bevpoolv2` + `pillarscatter` + voxelizer + post-processing: hand-written **SYCL** kernels (single-use ops) | `deploy/src/pointpillars/`, `deploy/src/pointpillars/voxelizer.cpp` | | **B — Second-based** (§5) | `./bevfusion_unified` | SparseEncoder (SparseConv3d + SubMConv3d) | **Single unified ONNX**: `bevfusion_unified.onnx` | `SparseConvolution` + `SubMConv3d` are called many times and fused with BN/ReLU, so they (and `BevPoolV2` / `SparseToDense`) live inside the **OpenVINO GPU plugin** | `deploy/src/bevfusion_unified/`, `deploy/src/bevfusion_unified/voxelizer_sycl.cpp` | **Dataset coverage** | Pipeline | V2X-I (DAIR-V2X-I) | KITTI | |---|---|---| | A — PointPillars (`./bevfusion`) | ✅ (§4.3) | ✅ (§4.4) | | B — Second (`./bevfusion_unified`) | ✅ (§5.3) | ✅ (§5.4) | Both pipelines use the same `--preset v2x|kitti` switch in deploy. Reference user commands (V2X-I is the default preset): ```bash # Pipeline B — unified (default INT8 XML; pass --fp16 for the FP16 ONNX) ./bevfusion_unified /training/data/dair-v2x-i-kitti/training --num-samples 1000 ./bevfusion_unified /training/data/kitti-v2x/training --num-samples 1000 --preset kitti # Pipeline A — split 4-ONNX (default FP32; pass --int8 for the 4-XML INT8 path) ./bevfusion /training/data/dair-v2x-i-kitti/training --num-samples 1000 --int8 ./bevfusion /training/data/kitti-v2x/training --num-samples 1000 --int8 --preset kitti ``` > **Already have NVIDIA's CUDA-V2XFusion checkpoint?** If you only want to deploy > NVIDIA's reference `dense_epoch_100_.pth` (or your own CUDA-V2XFusion-trained > ckpt) on Intel GPU — no retraining, no in-repo training-side work — skip this > guide and follow [Running NVIDIA’s V2X-I PointPillars Dense FP16 Model on Intel GPU](nvidia_ckpt_to_intel_gpu.md). > It's a focused weight-conversion path: NVIDIA `.pth` → 4 ONNX + INT8 IR → drop > into `deploy/data/v2xfusion/pointpillars/`. Pipeline A + V2X-I only. Sections: 1. Environment setup (shared) 2. Dataset preparation (V2X-I, KITTI) 3. Shared reference — custom ONNX ops, config inheritance 4. Pipeline A: PointPillars-based BEVFusion (`./bevfusion`) 5. Pipeline B: Second-based BEVFusion (`./bevfusion_unified`) --- ## 1. Environment Setup Throughout this document the following placeholders are used; substitute them with paths appropriate to your machine: | Placeholder | Meaning | |---|---| | `` | Absolute path to this repository's root (the parent of `training/` and `deploy/`) | | `` | Python virtual env for training + ONNX export + Pipeline A INT8 quantization (see §1.1) | | `` | Python virtual env for Pipeline B INT8 quantization + standalone OV inference (see §1.2) | | `` | Custom-built OpenVINO root containing `bin/intel64/Release/` (see §1.2) | | `` | Launcher prefix for `tools/train.py` / `tools/test.py`. See §1.1 for two options. | Two Python environments, kept strictly separate: ### 1.1 `bevEnv` — training + ONNX export + Pipeline A INT8 quantization ```bash PYTHON=/bin/python PIP=/bin/pip # Build CUDA extensions (including bev_pool_v2) cd /training $PIP install -e . # Verify bev_pool_v2 extension $PYTHON -c "from mmdet3d.ops.bev_pool_v2.bev_pool import bev_pool_v2, OVBEVPoolv2; print('OK')" ``` Pipeline A (PointPillars split 4-ONNX, §4) INT8 quantization via `export/pointpillars/quantize_all.py` runs in `bevEnv`. **Launching `tools/train.py` / `tools/test.py`** — both scripts use `torchpack.distributed`. There are two launch modes: ```bash # Option A — single-process, no torch.distributed (recommended for single GPU). # Pass --no-dist to train.py / test.py so they skip dist.init() and the # mmcv MMDistributedDataParallel wrapper (incompatible with newer PyTorch). TORCHPACK="/bin/python" # usage: # $TORCHPACK tools/train.py --no-dist --run-dir ... # $TORCHPACK tools/test.py --no-dist --eval bbox # Option B — multi-GPU distributed via torchpack dist-run (needs OpenMPI: # `apt install openmpi-bin`). torchpack's default mpirun flags are OpenMPI # syntax and will fail under Intel MPI. Drop --no-dist when using this. TORCHPACK="/bin/torchpack dist-run -np /bin/python" ``` All `$TORCHPACK tools/train.py --no-dist ...` commands shown below default to Option A (single-GPU). For multi-GPU, switch to Option B's `TORCHPACK` and remove `--no-dist`. ### 1.2 `spconvEnv` — standalone OV inference + Pipeline B INT8 quantization ```bash OV_DIR=/bin/intel64/Release export PYTHONPATH="$OV_DIR/python:${PYTHONPATH:-}" export LD_LIBRARY_PATH="$OV_DIR:${LD_LIBRARY_PATH:-}" OV_PYTHON=/bin/python ``` The OpenVINO build above must include the opset15 registration patch for `BevPoolV2 / SparseConvolution / SparseToDense`. Without this patch Pipeline B's saved INT8 IR cannot be read back. `spconvEnv` is used for: - Pipeline B unified quantization (`export/quantize_unified.py`, NNCF 3.1 + custom OV, §5.2.6). - Standalone OpenVINO-side inference tests (`tools/bevfusion_standalone_ov_inference.py`, §5.3). It is **not** used for Pipeline A (`export/pointpillars/quantize_all.py`) — that stays in `bevEnv`. Intel Arc **B580** is the reference GPU (exposed as `GPU.1`). --- ## 2. Dataset Preparation Both pipelines read from `/training/data/`. Two datasets are supported: | Dataset | Source | Final on-disk layout under `data/` | |---|---|---| | DAIR-V2X-I | [DAIR-V2X-I Google Drive bundle](https://drive.google.com/drive/folders/1FlBbtJfuoEOc0ey9wkkU1lGr5JtS4-gc) (`single-infrastructure-side-image/`, `single-infrastructure-side-velodyne/`, `single-infrastructure-side-label/`, `data_info.json`) | `data/dair-v2x-i/` (native) **+** `data/dair-v2x-i-kitti/` (KITTI-format mirror used by the evaluator) | | KITTI | [KITTI 3D Object](https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d) raw download (`training/{image_2,velodyne,calib,label_2}`, `testing/{image_2,velodyne,calib}`, `ImageSets/{train,val}.txt`) | `data/kitti-v2x/` (V2X-format mirror produced by `tools/convert_kitti_to_v2x_format.py`) | Configs reference these paths through `dataset_root` / `dataset_kitti_root` (`configs/V2X-I/default.yaml:2-3`, `configs/Kitti/default.yaml:2-3`); if you put the data anywhere else, override those two keys instead of changing the configs. ### 2.1 DAIR-V2X-I Download the DAIR-V2X-I bundle from [this Google Drive folder](https://drive.google.com/drive/folders/1FlBbtJfuoEOc0ey9wkkU1lGr5JtS4-gc), then follow BEVHeight's data preparation document end-to-end for the KITTI-format conversion and pkl generation steps: [ADLab-AutoDrive/BEVHeight — docs/prepare_dataset.md](https://github.com/ADLab-AutoDrive/BEVHeight/blob/main/docs/prepare_dataset.md). That guide produces the `dair_12hz_infos_{train,val}.pkl` annotation files this repo loads. **Final layout** (this is what `data/dair-v2x-i/` and `data/dair-v2x-i-kitti/` should look like after BEVHeight's prep is done): ``` data/ ├── dair-v2x-i/ # native DAIR-V2X-I │ ├── velodyne/ # *.pcd / *.bin point clouds │ ├── image/ # *.jpg camera frames │ ├── calib/ # virtuallidar↔camera + camera intrinsics JSONs │ ├── label/ # native JSON labels │ ├── data_info.json # DAIR-V2X-I sample manifest │ ├── dair_12hz_infos_train.pkl # produced by BEVHeight prep │ └── dair_12hz_infos_val.pkl # produced by BEVHeight prep └── dair-v2x-i-kitti/ # KITTI-format mirror (evaluator uses this) ├── training/ │ ├── calib/ # KITTI-style calib *.txt │ ├── image_2/ # KITTI-style images │ ├── label_2/ # KITTI-style labels (referenced as dataset_kitti_root) │ └── velodyne/ # KITTI-style point clouds ├── testing/ # placeholder (empty — DAIR-V2X-I single-side has no public test split) └── ImageSets/ ├── train.txt ├── val.txt ├── trainval.txt └── test.txt # placeholder, paired with empty testing/ ``` Sanity check after prep: ```bash ls data/dair-v2x-i/dair_12hz_infos_{train,val}.pkl # both must exist ls data/dair-v2x-i-kitti/training/label_2 | head -3 # KITTI-style label txts ``` ### 2.2 KITTI KITTI prep is a two-stage flow: standard MMDetection3D infos generation, then `tools/convert_kitti_to_v2x_format.py` to produce a V2XDataset-compatible mirror. #### 2.2.1 Stage 1 — generate `kitti_infos_*.pkl` with MMDetection3D v1.x This repo's `tools/create_data.py` only handles nuScenes; the `kitti_infos_*.pkl` files come from **upstream MMDetection3D v1.x** (`tools/create_data.py kitti`). After downloading the KITTI 3D Object archive into `/` with the standard layout: ``` / ├── ImageSets/{train,val,trainval,test}.txt ├── training/{calib,image_2,label_2,velodyne}/ └── testing/{calib,image_2,velodyne}/ ``` run upstream MMDetection3D v1.x's KITTI converter (in any clone of [open-mmlab/mmdetection3d](https://github.com/open-mmlab/mmdetection3d) at a v1.x tag — the converter is independent of this repo): ```bash # in an mmdet3d v1.x checkout python tools/create_data.py kitti \ --root-path ./data/kitti \ --out-dir ./data/kitti \ --extra-tag kitti \ --with-plane ``` `--with-plane` consumes the `planes/` road-plane txts that ship with KITTI 3D Object and bakes per-frame ground-plane info into `kitti_infos_*.pkl` — needed if your downstream training uses ground-plane augmentation. Drop the flag if your `/training/` does not contain `planes/`. Substitute `./data/kitti` with `` (an absolute path or a path relative to the mmdet3d checkout). You should end up with the layout the user-side mmdet3d run produces (matches what's already on the reference machine): ``` / ├── gt_database/ ├── kitti_dbinfos_train.pkl ├── kitti_gt_database/ ├── kitti_infos_test.pkl ├── kitti_infos_train.pkl ├── kitti_infos_trainval.pkl ├── kitti_infos_val.pkl ├── testing/ └── training/ ``` The `kitti_infos_{train,val,trainval,test}.pkl` files are the ones consumed by Stage 2. `gt_database/` and `kitti_dbinfos_train.pkl` are not needed for training in this repo. #### 2.2.2 Stage 2 — convert to V2XDataset format Use this repo's converter to build the V2X-format mirror under `data/kitti-v2x/`: ```bash cd /training $PYTHON tools/convert_kitti_to_v2x_format.py \ --src-root \ --dst-root data/kitti-v2x ``` The script (see [tools/convert_kitti_to_v2x_format.py](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/training/tools/convert_kitti_to_v2x_format.py)): 1. Symlinks `training/{image_2,velodyne,label_2,calib}` and top-level `image/`, `velodyne/` from `` into `data/kitti-v2x/`. 2. Generates `data/kitti-v2x/calib/virtuallidar_to_camera/.json` from each KITTI `calib/.txt` (computes `lidar2cam = R0_rect @ Tr_velo_to_cam`). 3. Rewrites each `kitti_infos_.pkl` from MMDetection3D v1.x's `{metainfo, data_list}` schema into the flat list-of-dict V2XDataset schema (KITTI cam-rect bbox → lidar-frame center + nuScenes Box convention `(w, l, h)` + yaw_lidar; KITTI category → nuScenes category). **Final layout:** ``` data/ └── kitti-v2x/ ├── image/ → /training/image_2 (symlink) ├── velodyne/ → /training/velodyne (symlink) ├── calib/ │ └── virtuallidar_to_camera/ │ ├── 000000.json │ ├── 000001.json │ └── ... ├── training/ │ ├── image_2/ → /training/image_2 (symlink) │ ├── velodyne/ → /training/velodyne (symlink) │ ├── label_2/ → /training/label_2 (symlink) │ └── calib/ → /training/calib (symlink) ├── testing/ │ ├── image_2/ → /testing/image_2 (symlink) │ ├── velodyne/ → /testing/velodyne (symlink) │ └── calib/ → /testing/calib (symlink) ├── kitti_infos_train.pkl # converted from /kitti_infos_train.pkl ├── kitti_infos_val.pkl ├── kitti_infos_trainval.pkl # only if Stage 1 produced it └── kitti_infos_test.pkl # only if Stage 1 produced it ``` Sanity check: ```bash ls data/kitti-v2x/kitti_infos_{train,val}.pkl ls data/kitti-v2x/calib/virtuallidar_to_camera | wc -l # should match #frames ls data/kitti-v2x/training/label_2 | head -3 ``` `configs/Kitti/default.yaml` already points `dataset_root: data/kitti-v2x` and `dataset_kitti_root: data/kitti-v2x/training/label_2`, so no config edits are required if you placed the output here. --- ## 3. Shared Reference ### 3.1 Custom ONNX Operators All custom ops are registered under domain `org.openvinotoolkit`: | Operator | Used by | Where implemented | Description | |---|---|---|---| | `BevPoolV2` | Pipeline B ONNX (camera branch) | OpenVINO GPU plugin | Camera-to-BEV view transform using precomputed geometry | | `SparseConvolution` | Pipeline B ONNX (lidar encoder) | OpenVINO GPU plugin | 3D sparse convolution with fused BN + optional ReLU | | `SparseToDense` | Pipeline B ONNX (lidar encoder) | OpenVINO GPU plugin | Sparse feature map → dense BEV tensor | Pipeline A has **no custom ops inside ONNX** — `bevpoolv2` / `pillarscatter` / voxelization / post-processing are all SYCL kernels outside the ONNX graph, so standard OpenVINO can load all 4 PP ONNXs unmodified. ### 3.2 Config Inheritance All configs follow a fixed recursive-override chain. Two encoder families live under distinct top-level neck directories (`lssfpn` for PointPillars, `secfpn` for Second): ``` configs/default.yaml └─ configs//default.yaml # dataset, image_size, object_classes └─ configs//det/default.yaml # detection model type (BEVFusion) └─ configs//det/centerhead/default.yaml ├─ .../lssfpn/default.yaml # ← Pipeline A (PointPillars) │ └─ .../camera+pointpillar/default.yaml │ └─ .../resnet34/default.yaml └─ .../secfpn/default.yaml # ← Pipeline B (Second) └─ .../camera+lidar/default.yaml └─ .../resnet34/default.yaml # (+ optional bevpoolv2.yaml) ``` `` ∈ `V2X-I`, `Kitti`, `nuscenes`. Backbone variants under each leaf directory: `resnet34`, `resnet50`, `fasternet`. --- ## 4. Pipeline A — PointPillars-based BEVFusion (4 ONNX) ### 4.1 Design Four independent ONNX files, each an independently quantizable / replaceable deploy stage: | Stage | ONNX file | Content | I/O | |---|---|---|---| | camera | `camera.backbone.onnx` | ResNet34 → GeneralizedLSSFPN → DepthNet | `img [1,1,3,864,1536]` → `camera_feature`, `camera_depth_weights` | | lidar PFE | `lidar_pfe.onnx` / `lidar_pfe_v7000.onnx` | PillarFeatureNet (f_cluster / f_center / mask fused into the graph) | `features [V,100,4]`, `num_voxels [V]`, `coors [V,4]` → `pillar_features [V,64]` | | fuser | `fuser.onnx` | ConvFuser + decoder.backbone + decoder.neck | `cam_bev [1,80,128,128]`, `lidar_bev [1,64,128,128]` → `middle [1,256,128,128]` | | head | `head.onnx` | CenterHead (shared_conv + task_heads, no decoder) | `middle [1,256,128,128]` → 12 task tensors | **Stays outside ONNX** (SYCL kernels in deploy): - Voxelization — [deploy/src/pointpillars/voxelizer.cpp](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/deploy/src/pointpillars/voxelizer.cpp) - `bevpoolv2` (camera BEV pooling) — deploy SYCL kernel - `PointPillarsScatter` (PFE output → dense BEV canvas) — deploy SYCL kernel - CenterHead post-processing (heatmap top-k, box decode, rotate-NMS) — deploy SYCL kernel The voxelizer's `coors` layout is `(batch_idx, x_idx, y_idx, z_idx)` — **opposite** to Pipeline B's voxelizer layout `(batch, z, y, x)`. The deploy-side SYCL voxelizers must follow each pipeline's own layout; the two cannot be shared. ### 4.2 Generic Workflow Throughout §4.2 we use placeholder variables; §4.3 / §4.4 fill them in per dataset: ```bash PP_CONFIG= PP_CKPT= ``` #### 4.2.1 Training ```bash $TORCHPACK tools/train.py --no-dist $PP_CONFIG --mode dense --run-dir ./work_dirs//pp/ ``` The BEVFusion base model in [mmdet3d/models/fusion_models/bevfusion.py](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/training/mmdet3d/models/fusion_models/bevfusion.py) auto-selects hard-voxelize (PointPillars) vs DynamicScatter (Second) based on `max_num_points > 0` — no training-code changes are needed to switch encoders. #### 4.2.2 Inference & Visualization ```bash $PYTHON tools/inference_vis.py $PP_CONFIG $PP_CKPT \ --split train --mode pred --bbox-score 0.3 --out-dir viz_pp ``` | Argument | Description | Default | |---|---|---| | `--mode` | `gt` or `pred` | `gt` | | `--split` | `train` or `val` | `val` | | `--bbox-score` | score threshold | `None` | | `--out-dir` | output directory | `viz` | Outputs go to `/camera/*.png` and `/lidar/*.png`. The script is encoder-agnostic — any model that emits `boxes_3d / scores_3d / labels_3d` works. #### 4.2.3 ONNX Export — All 4 Files in One Call ```bash $PYTHON export/pointpillars/export_all.py \ --config $PP_CONFIG --ckpt $PP_CKPT \ --out-dir export/onnx/pointpillars ``` Reference output sizes (V2X-I): | File | Size | Nodes | Notes | |---|---|---|---| | `camera.backbone.onnx` | 88 MB | 113 | ResNet34 + LSSFPN + DepthNet | | `lidar_pfe.onnx` | 17 KB | 198 | dynamic V (up to 12000) | | `fuser.onnx` | 405 KB | 51 | ConvFuser + decoder.backbone + decoder.neck → `middle[1,256,128,128]` | | `head.onnx` | 52 MB | 38 | CenterHead only, consumes `middle` | Individual exports (if you only want one stage): ```bash $PYTHON export/pointpillars/export-camera.py --config $PP_CONFIG --ckpt $PP_CKPT -o export/onnx/pointpillars/camera.backbone.onnx $PYTHON export/pointpillars/export-lidar.py --config $PP_CONFIG --ckpt $PP_CKPT -o export/onnx/pointpillars/lidar_pfe.onnx $PYTHON export/pointpillars/export-fuser.py --config $PP_CONFIG --ckpt $PP_CKPT -o export/onnx/pointpillars/fuser.onnx $PYTHON export/pointpillars/export-head.py --config $PP_CONFIG --ckpt $PP_CKPT -o export/onnx/pointpillars/head.onnx ``` #### 4.2.4 Static-V PFE Export (recommended for deploy) Dynamic V (number of non-empty pillars) forces OpenVINO to re-specialize the PFE kernel every frame. A fixed V bakes a single shape and drops PFE latency. ```bash # V=7000 — recommended static-V setting for deploy --int8 $PYTHON export/pointpillars/export-lidar.py \ --config $PP_CONFIG --ckpt $PP_CKPT \ --fixed-v 7000 --split val \ -o export/onnx/pointpillars/lidar_pfe_v7000.onnx ``` The exporter pads trace inputs to `V=N`, and deploy auto-detects static shape. Choose V based on your dataset statistics with enough margin. Recommended default is `V=7000`. #### 4.2.5 INT8 Quantization — All 4 ONNXs in One Call Produces `quantized_camera.xml` / `quantized_lidar_pfe.xml` / `quantized_fuser.xml` / `quantized_head.xml` via NNCF PTQ on 300 calibration frames. ```bash $PYTHON export/pointpillars/quantize_all.py \ --config $PP_CONFIG --ckpt $PP_CKPT \ --onnx-dir export/onnx/pointpillars \ --out-dir export/onnx/pointpillars \ --num-samples 300 ``` `quantize_all.py` prefers `lidar_pfe_v7000.onnx` when both static and dynamic PFE ONNX are present. Individual stages (all share the same CLI shape): ```bash $PYTHON export/pointpillars/quantize_camera_backbone.py --config $PP_CONFIG --ckpt $PP_CKPT \ --onnx export/onnx/pointpillars/camera.backbone.onnx \ --out export/onnx/pointpillars/quantized_camera.xml --num-samples 300 $PYTHON export/pointpillars/quantize_lidar_pfe.py --config $PP_CONFIG --ckpt $PP_CKPT \ --onnx export/onnx/pointpillars/lidar_pfe_v7000.onnx \ --out export/onnx/pointpillars/quantized_lidar_pfe.xml --num-samples 300 $PYTHON export/pointpillars/quantize_fuser.py --config $PP_CONFIG --ckpt $PP_CKPT \ --onnx export/onnx/pointpillars/fuser.onnx \ --out export/onnx/pointpillars/quantized_fuser.xml --num-samples 300 $PYTHON export/pointpillars/quantize_head.py --config $PP_CONFIG --ckpt $PP_CKPT \ --onnx export/onnx/pointpillars/head.onnx \ --out export/onnx/pointpillars/quantized_head.xml --num-samples 300 ``` **fuser/head split** — `fuser.onnx` contains decoder backbone+neck, and `head.onnx` contains CenterHead shared conv + task heads. #### 4.2.6 End-to-End Command Sequence ```bash cd /training # 1) Train $TORCHPACK tools/train.py --no-dist $PP_CONFIG --mode dense --run-dir ./work_dirs//pp/ # 2) Inference sanity check $PYTHON tools/inference_vis.py $PP_CONFIG $PP_CKPT --split train --mode pred # 3) Export 4 ONNX files $PYTHON export/pointpillars/export_all.py \ --config $PP_CONFIG --ckpt $PP_CKPT --out-dir export/onnx/pointpillars # 4) Static-V PFE for deploy INT8 (V=7000 matches deploy's max_voxels) $PYTHON export/pointpillars/export-lidar.py \ --config $PP_CONFIG --ckpt $PP_CKPT --fixed-v 7000 --split val \ -o export/onnx/pointpillars/lidar_pfe_v7000.onnx # 5) INT8 quantization (all 4 stages) — auto picks lidar_pfe_v7000.onnx $PYTHON export/pointpillars/quantize_all.py \ --config $PP_CONFIG --ckpt $PP_CKPT \ --onnx-dir export/onnx/pointpillars --out-dir export/onnx/pointpillars \ --num-samples 300 ``` #### 4.2.7 Deploy Repo Runtime (`./bevfusion`) The deploy-side split-pipeline runner is `./bevfusion` in the sibling repo ``. It loads the 4 split models from `deploy/data//pointpillars/`: ``` camera.backbone.onnx / quantized_camera.xml lidar_pfe.onnx / lidar_pfe_v7000.onnx / quantized_lidar_pfe.xml fuser.onnx / quantized_fuser.xml head.onnx / quantized_head.xml ``` Copy the artifacts from `export/onnx/pointpillars/` to `deploy/data//pointpillars/` before running — the deploy repo does not read from the training tree. `` is `v2xfusion` for DAIR-V2X-I and `kitti` for KITTI (see [deploy/src/pipeline/dataset_preset.cpp](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/deploy/src/pipeline/dataset_preset.cpp) for the full preset geometry table). The `--preset` flag passed to the binary is `v2x` or `kitti`. ```bash cd /deploy/build # FP32 (loads the 4 .onnx files; PFE prefers v7000 when present, else dynamic). # Preset defaults to v2x when --preset is omitted. ./bevfusion --num-samples 30 --vis # V2X-I FP32 ./bevfusion --preset kitti --num-samples 30 --vis # KITTI FP32 # INT8 (loads the 4 quantized_*.xml files; PFE pinned to V=7000) ./bevfusion --num-samples 30 --vis --int8 # V2X-I INT8 ./bevfusion --preset kitti --num-samples 30 --vis --int8 # KITTI INT8 # Per-stage INT8 toggles ./bevfusion ... --int8-camera --int8-pfe --int8-fuser --int8-head ``` Flags: - `--preset v2x` / `--preset kitti` selects both geometry (image size, BEV grid, pc_range, out_size_factor) and the model dir. - `--int8` turns on INT8 for all 4 stages; individual toggles let you mix. - `--dump-pred --pred-dir DIR` writes KITTI-format box txts for offline metric eval. - `--vis` writes `bevfusion.mp4` into the build dir. ### 4.3 Dataset: V2X-I (DAIR-V2X-I) ```bash PP_CONFIG=configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml PP_CKPT=work_dirs/V2X-I/pp/latest.pth PP_ONNX_DIR=export/onnx/pointpillars DEPLOY_DIR=/deploy/data/v2xfusion/pointpillars ``` For deploy consistency, use static-V with `--fixed-v 7000` in the PFE export step. **End-to-end commands:** ```bash cd /training # 1) Train $TORCHPACK tools/train.py --no-dist $PP_CONFIG --mode dense --run-dir ./work_dirs/V2X-I/pp # 2) Inference sanity check $PYTHON tools/inference_vis.py $PP_CONFIG $PP_CKPT --split val --mode pred # 3) Export 4 ONNX $PYTHON export/pointpillars/export_all.py \ --config $PP_CONFIG --ckpt $PP_CKPT --out-dir $PP_ONNX_DIR # 4) Static-V PFE for INT8 deploy $PYTHON export/pointpillars/export-lidar.py \ --config $PP_CONFIG --ckpt $PP_CKPT --fixed-v 7000 --split val \ -o $PP_ONNX_DIR/lidar_pfe_v7000.onnx # 5) INT8 quantization (auto picks v7000) $PYTHON export/pointpillars/quantize_all.py \ --config $PP_CONFIG --ckpt $PP_CKPT \ --onnx-dir $PP_ONNX_DIR --out-dir $PP_ONNX_DIR --num-samples 300 # 6) Publish to deploy cp $PP_ONNX_DIR/{camera.backbone,fuser,head,lidar_pfe,lidar_pfe_v7000}.onnx "$DEPLOY_DIR/" cp $PP_ONNX_DIR/quantized_{camera,lidar_pfe,fuser,head}.{xml,bin} "$DEPLOY_DIR/" # 7) Deploy runtime (V2X-I — preset defaults to v2x; FP32 omits --int8) cd /deploy/build ./bevfusion /training/data/dair-v2x-i-kitti/training --num-samples 1000 # FP32 ./bevfusion /training/data/dair-v2x-i-kitti/training --num-samples 1000 --int8 # INT8 ``` ### 4.4 Dataset: KITTI ```bash PP_CONFIG=configs/Kitti/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml PP_CKPT=work_dirs/Kitti/pp/latest.pth PP_ONNX_DIR=export/onnx/pointpillars/kitti DEPLOY_DIR=/deploy/data/kitti/pointpillars ``` Backbone variants under `configs/Kitti/det/centerhead/lssfpn/camera+pointpillar/`: `{default,resnet34,resnet50,fasternet}/`. Dataset-scoped ONNX output dir (`export/onnx/pointpillars/kitti/`) keeps KITTI artifacts from colliding with V2X-I's. Geometry differences vs V2X-I (from `deploy/src/pipeline/dataset_preset.cpp`): | | KITTI | V2X-I | |---|---|---| | image (W×H) | 1280×384 | 1536×864 | | camera feat (W×H) | 80×24 | 96×54 | | BEV grid | 100×100 | 128×128 | | pc_range | [0,-40,-5]→[80,40,3] | [0,-51.2,-5]→[102.4,51.2,3] | | post_center_range | [0,-45,-5]→[85,45,3] | same as pc_range | | split_post_voxel_size | 0.1 | 0.2 | | out_size_factor | 8 | 4 | **KITTI-specific note:** - Always pass `--config` and `--ckpt` explicitly to quantization/export scripts. **End-to-end commands:** ```bash cd /training # 1) Train $TORCHPACK tools/train.py --no-dist $PP_CONFIG --mode dense --run-dir ./work_dirs/Kitti/pp # 2) Inference sanity check $PYTHON tools/inference_vis.py $PP_CONFIG $PP_CKPT --split val --mode pred # 3) Export 4 ONNX $PYTHON export/pointpillars/export_all.py \ --config $PP_CONFIG --ckpt $PP_CKPT --out-dir $PP_ONNX_DIR # 4) Static-V PFE for INT8 deploy (V=7000) $PYTHON export/pointpillars/export-lidar.py \ --config $PP_CONFIG --ckpt $PP_CKPT --fixed-v 7000 --split val \ -o $PP_ONNX_DIR/lidar_pfe_v7000.onnx # 5) INT8 quantization (all 4 stages; quantize_all auto picks v7000) $PYTHON export/pointpillars/quantize_all.py \ --config $PP_CONFIG --ckpt $PP_CKPT \ --onnx-dir $PP_ONNX_DIR --out-dir $PP_ONNX_DIR --num-samples 300 # 6) Publish to deploy cp $PP_ONNX_DIR/{camera.backbone,fuser,head,lidar_pfe,lidar_pfe_v7000}.onnx "$DEPLOY_DIR/" cp $PP_ONNX_DIR/quantized_{camera,lidar_pfe,fuser,head}.{xml,bin} "$DEPLOY_DIR/" # 7) Deploy runtime (KITTI — pass --preset kitti; FP32 omits --int8) cd /deploy/build ./bevfusion /training/data/kitti-v2x/training \ --num-samples 1000 --preset kitti # FP32 ./bevfusion /training/data/kitti-v2x/training \ --num-samples 1000 --int8 --preset kitti # INT8 ``` Validation target: KITTI INT8 frame-by-frame box counts match FP32 exactly on the smoke set. --- ## 5. Pipeline B — Second-based BEVFusion (unified ONNX) ### 5.1 Design A **single** `bevfusion_unified.onnx` merges camera + lidar + fuser + head. Unlike Pipeline A, the lidar sparse encoder contains ~21 `SparseConv3d` / `SubMConv3d` layers that: - are called many times, with BN and ReLU fused into each sparse-conv op; - require custom forward implementations that don't map cleanly to ONNX standard ops. These ops (plus `BevPoolV2` for camera and `SparseToDense` at the encoder boundary) therefore live as **OpenVINO GPU plugin custom ops** (§2.1) rather than as SYCL kernels outside the graph. Deploy runs one OpenVINO infer call per frame and the plugin dispatches sparse kernels internally. Deploy directories: - [deploy/src/bevfusion_unified/](https://github.com/open-edge-platform/edge-ai-suites/tree/main/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/deploy/src/bevfusion_unified/) — pipeline driver + SYCL voxelizer (`voxelizer_sycl.cpp` lives here, `coors` = `(batch, z, y, x)`) - [deploy/test/bevfusion_unified.cpp](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/deploy/test/bevfusion_unified.cpp) — `./bevfusion_unified` entry point ### 5.2 Generic Workflow Placeholder variables used throughout §5.2; §5.3–§5.4 fill them per dataset: ```bash CP_CONFIG= CP_CKPT=/bevpoolv2/latest.pth> ``` #### 5.2.1 Training — BEVPool V1 vs V2 **BEVPool V1 (default):** ```bash CUDA_VISIBLE_DEVICES=0 $TORCHPACK tools/train.py --no-dist \ configs//det/centerhead/secfpn/camera+lidar/resnet34/default.yaml \ --run-dir ./work_dirs//bevpoolv1 ``` **BEVPool V2 (recommended for deploy):** add `use_bevpool: bevpoolv2` under `vtransform`. Either create a sibling `bevpoolv2.yaml`: ```yaml model: encoders: camera: vtransform: use_bevpool: bevpoolv2 depth_threshold: 0 ``` or inline the override in the existing `resnet34/default.yaml`. Then: ```bash CUDA_VISIBLE_DEVICES=0 $TORCHPACK tools/train.py --no-dist \ configs//det/centerhead/secfpn/camera+lidar/resnet34/bevpoolv2.yaml \ --run-dir ./work_dirs//bevpoolv2 ``` **work_dirs convention**: KITTI uses dataset-scoped subdirs (`work_dirs/Kitti/bevpoolv2/`). For V2X-I the same convention is `work_dirs/V2X-I/bevpoolv2/`. #### 5.2.2 Inference & Visualization ```bash $PYTHON tools/inference_vis.py $CP_CONFIG $CP_CKPT \ --mode pred --out-dir viz --bbox-score 0.3 ``` | Argument | Description | Default | |---|---|---| | `--mode` | `gt` or `pred` | `gt` | | `--split` | `train` or `val` | `val` | | `--bbox-score` | min confidence | `None` | | `--bbox-classes` | filter by class indices | `None` | | `--out-dir` | output directory | `viz` | Outputs: `/{camera,lidar,map}/`. #### 5.2.3 Precompute BEVPool V2 Geometry **Run this before any ONNX export** — generates `indices.bin` + `intervals.bin` consumed by both the unified model and the camera sub-model: ```bash # From a saved tensor data sample $PYTHON export/precompute_geometry.py $CP_CONFIG $CP_CKPT \ --data-path tools/dump/00000/example-data.pth \ -o export/geometry # Or from the dataset directly $PYTHON export/precompute_geometry.py $CP_CONFIG $CP_CKPT \ --from-dataset --split val --sample-idx 0 \ -o export/geometry ``` Output (bev_latest compatible): | File | Format | Description | |---|---|---| | `indices.bin` | `[uint32 count][count * uint32]` | 466560 sorted point indices (full grid, sentinel included) | | `intervals.bin` | `[uint32 count][count * int3(start, end, bev_rank)]` | intervals with absolute offsets; sentinel has rank=-1 | | `geometry.pth` | PyTorch tensor dict | same data, PyTorch-native | #### 5.2.4 Export Unified ONNX Exports the full BEVFusion pipeline as a single ONNX (internally merges 3 sub-models): ```bash $PYTHON export/export_unified_onnx.py $CP_CONFIG $CP_CKPT \ --geometry-dir export/geometry \ -o export/onnx/bevfusion_unified.onnx ``` Unified model I/O: | Input | Shape | Description | |---|---|---| | `img` | `[1, 3, 864, 1536]` | camera image (NCHW) | | `indices` | `[num_points]` | BEVPoolV2 sorted indices (from geometry) | | `intervals` | `[num_intervals, 3]` | BEVPoolV2 intervals (start, end, bev_rank) | | `voxel_features` | `[N_vox, 4]` | voxelized lidar features | | `voxel_indices` | `[N_vox, 4]` | voxel coordinates (batch, z, y, x) | > Older unified exports may still use 5-D `img` (`[1, 1, 3, H, W]`) — both are supported by the standalone inference script. | Output | Shape | Description | |---|---|---| | `task{i}_heatmap` | `[1, 5, 128, 128]` | per-task class heatmap | | `task{i}_reg` | `[1, 2, 128, 128]` | regression offset | | `task{i}_height` | `[1, 1, 128, 128]` | height | | `task{i}_dim` | `[1, 3, 128, 128]` | box dimensions (l, w, h) | | `task{i}_rot` | `[1, 2, 128, 128]` | rotation (sin, cos) | | `task{i}_vel` | `[1, 2, 128, 128]` | velocity (vx, vy) | #### 5.2.5 OpenVINO Inference — Unified Model Runs the unified ONNX with CenterHead post-processing and visualization. Requires Intel Arc GPU for `SparseConvolution` ops. **From dataset** ([tools/bevfusion_standalone_ov_inference.py](https://github.com/open-edge-platform/edge-ai-suites/blob/main/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/training/tools/bevfusion_standalone_ov_inference.py)) — no mmdet3d runtime dependency: ```bash $OV_PYTHON tools/bevfusion_standalone_ov_inference.py \ --data-root --ann-file <...infos_val.pkl> \ --onnx-path \ --geometry-dir \ --device GPU.1 --out-dir viz_standalone \ --bbox-score 0.3 --max-samples 10 ``` Per-dataset argument values live in §5.3–§5.4. - Auto-adapts `img` rank for both 4-D NCHW (`[1,3,H,W]`) and legacy 5-D (`[1,1,3,H,W]`) unified ONNX formats. - **Geometry auto-detection**: `init_dataset_geometry_from_onnx()` reads `pc_range` / `voxel_size` / `sparse_shape` from ONNX attributes at startup, so the same script works across datasets without manual source edits. | Argument | Description | Default | |---|---|---| | `--onnx-path` | unified ONNX | (required) | | `--geometry-dir` | `indices.bin` + `intervals.bin` | (required) | | `--device` | OpenVINO device (`GPU.1` for Arc B580) | `CPU` | | `--bbox-score` | score threshold | `0.1` | | `--max-samples` | frames to process | `None` | #### 5.2.6 INT8 Quantization — Unified Model Produces `bevfusion_unified_int8.xml`/`.bin` from the FP32 unified ONNX via NNCF PTQ. **Prerequisites:** - Custom OpenVINO build with the opset15 patch (§1.2) — without it the saved IR can't be read back. - Two envs, no mixing: `bevEnv` for the offline voxelizer dump; `spconvEnv` (py3.12 + custom OV + NNCF 3.1) for the actual quantization. - `$CP_CKPT` + matching `bevpoolv2.yaml` — the bevpoolv1 and bevpoolv2 checkpoints are **not** interchangeable. - `export/geometry/indices.bin` + `intervals.bin` (§5.2.3). - `export/onnx/bevfusion_unified.onnx` (§5.2.4). - `export/dump_voxels.py` now reads directly from `cfg.data.` (no dependency on `tools/dump/*/example-data.pth`). One-time NNCF install: ```bash /bin/pip install "nncf==3.1.0" ``` **Three-stage pipeline:** ```bash cd /training # Stage 1 — voxelizer dump (bevEnv, needs mmdet3d/spconv) $PYTHON export/dump_voxels.py $CP_CONFIG $CP_CKPT \ -o export/calib_voxels --num-frames 400 --split val # Stage 2 — pseudo-GT from FP32 self-distillation (spconvEnv) # Quantization validation is self-distilled: FP32 decoded boxes are used as pseudo-GT. rm -f export/calib_voxels/_pseudo_gt.npz $OV_PYTHON -u export/quantize_unified.py \ --stage pseudo-gt --n-calib 300 --n-val 100 \ --model-fp32 export/onnx/bevfusion_unified.onnx \ --geo-dir export/geometry \ --calib-dir export/calib_voxels \ --pseudo-gt-cache export/calib_voxels/_pseudo_gt.npz # Stage 3 — NNCF PTQ (spconvEnv) # Default output: /deploy/data/v2xfusion/onnx/bevfusion_unified_int8.xml/.bin $OV_PYTHON -u export/quantize_unified.py \ --stage quantize --n-calib 300 --n-val 100 --plain-ptq \ --preset mixed --activation-range histogram \ --model-fp32 export/onnx/bevfusion_unified.onnx \ --geo-dir export/geometry \ --calib-dir export/calib_voxels \ --pseudo-gt-cache export/calib_voxels/_pseudo_gt.npz ``` **Recommended flags:** | Flag | Value | Reason | |---|---|---| | `--preset` | `mixed` | Recommended default preset. | | `--activation-range` | `histogram` | Recommended default activation range. | | `--plain-ptq` | — | Use plain PTQ mode. | | `--n-calib` / `--n-val` | 300 / 100 | Default. | FP-only custom ops are handled automatically by `quantize_unified.py`. **Outputs:** | File | Notes | |---|---| | `bevfusion_unified.onnx` | FP32 source | | `bevfusion_unified_int8.xml` | INT8 IR topology | | `bevfusion_unified_int8.bin` | INT8 weights | Use the default `--activation-range histogram` unless you have validated alternatives on your target deployment. #### 5.2.7 End-to-End Command Sequence For release workflows, use the dataset-specific end-to-end command blocks in §5.3 (V2X-I) and §5.4 (KITTI). ### 5.3 Dataset: V2X-I (DAIR-V2X-I) ```bash CP_CONFIG=configs/V2X-I/det/centerhead/secfpn/camera+lidar/resnet34/bevpoolv2.yaml CP_CKPT=work_dirs/V2X-I/bevpoolv2/latest.pth GEO_DIR=export/geometry CALIB_DIR=export/calib_voxels UNIFIED_ONNX=export/onnx/bevfusion_unified.onnx DEPLOY_V2X_DIR=/deploy/data/v2xfusion/second ``` Deploy artifacts are stored in `deploy/data/v2xfusion/second/`: `bevfusion_unified_fp16.onnx` and `bevfusion_unified_int8.xml`/`.bin`. **End-to-end commands:** ```bash cd /training # 1) Train (BEVPoolV2 recommended, see §5.2.1) $TORCHPACK tools/train.py --no-dist $CP_CONFIG --run-dir ./work_dirs/V2X-I/bevpoolv2 # 2) Validate $PYTHON tools/inference_vis.py $CP_CONFIG $CP_CKPT --split val --mode pred # 3) Precompute BEVPoolV2 geometry $PYTHON export/precompute_geometry.py $CP_CONFIG $CP_CKPT \ --from-dataset --split val --sample-idx 0 -o $GEO_DIR # 4) Export unified ONNX $PYTHON export/export_unified_onnx.py $CP_CONFIG $CP_CKPT \ --geometry-dir $GEO_DIR -o $UNIFIED_ONNX # 5) Calibration voxel dump (dataset-driven) $PYTHON export/dump_voxels.py $CP_CONFIG $CP_CKPT \ -o $CALIB_DIR --num-frames 400 --split val # 6) INT8 quantize (3-stage, see §5.2.6) rm -f $CALIB_DIR/_pseudo_gt.npz $OV_PYTHON -u export/quantize_unified.py --stage pseudo-gt \ --n-calib 300 --n-val 100 \ --model-fp32 $UNIFIED_ONNX --geo-dir $GEO_DIR \ --calib-dir $CALIB_DIR --pseudo-gt-cache $CALIB_DIR/_pseudo_gt.npz $OV_PYTHON -u export/quantize_unified.py --stage quantize \ --n-calib 300 --n-val 100 --plain-ptq \ --preset mixed --activation-range histogram \ --model-fp32 $UNIFIED_ONNX --geo-dir $GEO_DIR \ --calib-dir $CALIB_DIR --pseudo-gt-cache $CALIB_DIR/_pseudo_gt.npz \ --output $DEPLOY_V2X_DIR/bevfusion_unified_int8.xml # 7) Publish deploy artifacts (FP16 ONNX is exported with --fp16 from # export/export_unified_onnx.py; INT8 .xml/.bin are produced by step 6) cp export/onnx/bevfusion_unified_fp16.onnx $DEPLOY_V2X_DIR/ # (the INT8 IR was already written into $DEPLOY_V2X_DIR by --output above) # 8) Deploy runtime — preset defaults to v2x; INT8 is the default model cd /deploy/build LD_LIBRARY_PATH=/bin/intel64/Release:${LD_LIBRARY_PATH:-} \ ./bevfusion_unified /training/data/dair-v2x-i-kitti/training \ --num-samples 1000 # INT8 (default) LD_LIBRARY_PATH=/bin/intel64/Release:${LD_LIBRARY_PATH:-} \ ./bevfusion_unified /training/data/dair-v2x-i-kitti/training \ --num-samples 1000 --fp16 # FP16 ``` Standalone OV inference (`tools/bevfusion_standalone_ov_inference.py`) is useful for Python-side debugging without rebuilding the deploy binary; it defaults `--ann-file` to `/dair_12hz_infos_val.pkl`: ```bash $OV_PYTHON tools/bevfusion_standalone_ov_inference.py \ --data-root data/dair-v2x-i \ --onnx-path $UNIFIED_ONNX --geometry-dir $GEO_DIR \ --device GPU.1 --out-dir viz_standalone \ --bbox-score 0.3 --max-samples 10 ``` ### 5.4 Dataset: KITTI ```bash CP_CONFIG=configs/Kitti/det/centerhead/secfpn/camera+lidar/resnet34/bevpoolv2.yaml CP_CKPT=work_dirs/Kitti/bevpoolv2/latest.pth GEO_DIR=export/geometry_kitti CALIB_DIR=export/calib_voxels_kitti UNIFIED_ONNX=export/bevfusion_unified_kitti.onnx DEPLOY_KITTI_DIR=/deploy/data/kitti/second ``` Per-dataset paths (`export/geometry_kitti`, `export/calib_voxels_kitti`, etc.) keep KITTI artifacts from overwriting V2X-I's. The deploy KITTI second-based dir holds the same two model variants as V2X-I: `bevfusion_unified_fp16.onnx` + `bevfusion_unified_int8.xml`/`.bin`. The unified pipeline auto-detects `pc_range` / `voxel_size` from the ONNX; the only deploy-side switch needed is `--preset kitti`. For KITTI quantization, always pass `--config $CP_CONFIG` to `quantize_unified.py`. **End-to-end commands:** ```bash cd /training # 1) Train $TORCHPACK tools/train.py --no-dist $CP_CONFIG --run-dir ./work_dirs/Kitti/bevpoolv2 # 2) Validate $PYTHON tools/inference_vis.py $CP_CONFIG $CP_CKPT --split val --mode pred # 3) Precompute BEVPoolV2 geometry (KITTI-specific dir) $PYTHON export/precompute_geometry.py $CP_CONFIG $CP_CKPT \ --from-dataset --split val --sample-idx 0 -o $GEO_DIR # 4) Export unified ONNX (KITTI-specific name) $PYTHON export/export_unified_onnx.py $CP_CONFIG $CP_CKPT \ --geometry-dir $GEO_DIR -o $UNIFIED_ONNX # 5) Calibration voxel dump $PYTHON export/dump_voxels.py $CP_CONFIG $CP_CKPT \ -o $CALIB_DIR --num-frames 400 --split val # 6) INT8 quantize (KITTI MUST pass --config; see note above) rm -f $CALIB_DIR/_pseudo_gt.npz $OV_PYTHON -u export/quantize_unified.py --stage pseudo-gt \ --config $CP_CONFIG \ --n-calib 300 --n-val 100 \ --model-fp32 $UNIFIED_ONNX --geo-dir $GEO_DIR \ --calib-dir $CALIB_DIR --pseudo-gt-cache $CALIB_DIR/_pseudo_gt.npz $OV_PYTHON -u export/quantize_unified.py --stage quantize \ --config $CP_CONFIG \ --n-calib 300 --n-val 100 --plain-ptq \ --preset mixed --activation-range histogram \ --model-fp32 $UNIFIED_ONNX --geo-dir $GEO_DIR \ --calib-dir $CALIB_DIR --pseudo-gt-cache $CALIB_DIR/_pseudo_gt.npz \ --output $DEPLOY_KITTI_DIR/bevfusion_unified_int8.xml # 7) Publish FP16 source to deploy cp export/onnx/bevfusion_unified_kitti_fp16.onnx \ $DEPLOY_KITTI_DIR/bevfusion_unified_fp16.onnx # 8) Deploy runtime — pass --preset kitti; INT8 is the default model cd /deploy/build LD_LIBRARY_PATH=/bin/intel64/Release:${LD_LIBRARY_PATH:-} \ ./bevfusion_unified /training/data/kitti-v2x/training \ --num-samples 1000 --preset kitti # INT8 (default) LD_LIBRARY_PATH=/bin/intel64/Release:${LD_LIBRARY_PATH:-} \ ./bevfusion_unified /training/data/kitti-v2x/training \ --num-samples 1000 --preset kitti --fp16 # FP16 ``` The `.bin` sibling file is auto-produced beside the `.xml`. Standalone Python-side inference for offline debugging: ```bash OV_ROOT=/bin/intel64/Release \ PYTHONPATH=$OV_ROOT/python:$PYTHONPATH \ LD_LIBRARY_PATH=$OV_ROOT:$LD_LIBRARY_PATH \ /bin/python tools/bevfusion_standalone_ov_inference.py \ --data-root data/kitti-v2x \ --ann-file data/kitti-v2x/kitti_infos_val.pkl \ --onnx-path $UNIFIED_ONNX --geometry-dir $GEO_DIR \ --device GPU.1 --out-dir viz_standalone \ --bbox-score 0.5 --max-samples 100 ``` > For the **split** (PP) pipeline on KITTI — which is what `./bevfusion --preset kitti --int8` runs — see §4.4. The two paths are independent: `./bevfusion` ≠ `./bevfusion_unified`. :::{toctree} :hidden: Installation Convert NVIDIA Checkpoint :::