Get Started with Intermediate Fusion - Training#

Overview#

This repo is the training + ONNX-conversion side for the deploy pipelines in Intermediate Fusion/Deploy. Two parallel deploy pipelines are supported; pick the one matching your deploy target:

Pipeline

Deploy binary

Lidar encoder

ONNX artifacts

Custom ops (and where they live)

Deploy directories

A — PointPillars-based (§4)

./bevfusion

PillarFeatureNet + PointPillarsScatter

4 independent ONNX: camera.backbone / lidar_pfe / fuser / head

bevpoolv2 + pillarscatter + voxelizer + post-processing: hand-written SYCL kernels (single-use ops)

deploy/src/pointpillars/, deploy/src/pointpillars/voxelizer.cpp

B — Second-based (§5)

./bevfusion_unified

SparseEncoder (SparseConv3d + SubMConv3d)

Single unified ONNX: bevfusion_unified.onnx

SparseConvolution + SubMConv3d are called many times and fused with BN/ReLU, so they (and BevPoolV2 / SparseToDense) live inside the OpenVINO GPU plugin

deploy/src/bevfusion_unified/, deploy/src/bevfusion_unified/voxelizer_sycl.cpp

Dataset coverage

Pipeline

V2X-I (DAIR-V2X-I)

KITTI

A — PointPillars (./bevfusion)

✅ (§4.3)

✅ (§4.4)

B — Second (./bevfusion_unified)

✅ (§5.3)

✅ (§5.4)

Both pipelines use the same --preset v2x|kitti switch in deploy. Reference user commands (V2X-I is the default preset):

# Pipeline B — unified (default INT8 XML; pass --fp16 for the FP16 ONNX)
./bevfusion_unified <REPO>/training/data/dair-v2x-i-kitti/training --num-samples 1000
./bevfusion_unified <REPO>/training/data/kitti-v2x/training --num-samples 1000 --preset kitti

# Pipeline A — split 4-ONNX (default FP32; pass --int8 for the 4-XML INT8 path)
./bevfusion <REPO>/training/data/dair-v2x-i-kitti/training --num-samples 1000 --int8
./bevfusion <REPO>/training/data/kitti-v2x/training --num-samples 1000 --int8 --preset kitti

Already have NVIDIA’s CUDA-V2XFusion checkpoint? If you only want to deploy NVIDIA’s reference dense_epoch_100_.pth (or your own CUDA-V2XFusion-trained ckpt) on Intel GPU — no retraining, no in-repo training-side work — skip this guide and follow Running NVIDIA’s V2X-I PointPillars Dense FP16 Model on Intel GPU. It’s a focused weight-conversion path: NVIDIA .pth → 4 ONNX + INT8 IR → drop into deploy/data/v2xfusion/pointpillars/. Pipeline A + V2X-I only.

Sections:

  1. Environment setup (shared)

  2. Dataset preparation (V2X-I, KITTI)

  3. Shared reference — custom ONNX ops, config inheritance

  4. Pipeline A: PointPillars-based BEVFusion (./bevfusion)

  5. Pipeline B: Second-based BEVFusion (./bevfusion_unified)


1. Environment Setup#

Throughout this document the following placeholders are used; substitute them with paths appropriate to your machine:

Placeholder

Meaning

<REPO>

Absolute path to this repository’s root (the parent of training/ and deploy/)

<BEV_ENV>

Python virtual env for training + ONNX export + Pipeline A INT8 quantization (see §1.1)

<SPCONV_ENV>

Python virtual env for Pipeline B INT8 quantization + standalone OV inference (see §1.2)

<OPENVINO_ROOT>

Custom-built OpenVINO root containing bin/intel64/Release/ (see §1.2)

<TORCHPACK>

Launcher prefix for tools/train.py / tools/test.py. See §1.1 for two options.

Two Python environments, kept strictly separate:

1.1 bevEnv — training + ONNX export + Pipeline A INT8 quantization#

PYTHON=<BEV_ENV>/bin/python
PIP=<BEV_ENV>/bin/pip

# Build CUDA extensions (including bev_pool_v2)
cd <REPO>/training
$PIP install -e .

# Verify bev_pool_v2 extension
$PYTHON -c "from mmdet3d.ops.bev_pool_v2.bev_pool import bev_pool_v2, OVBEVPoolv2; print('OK')"

Pipeline A (PointPillars split 4-ONNX, §4) INT8 quantization via export/pointpillars/quantize_all.py runs in bevEnv.

Launching tools/train.py / tools/test.py — both scripts use torchpack.distributed. There are two launch modes:

# Option A — single-process, no torch.distributed (recommended for single GPU).
#   Pass --no-dist to train.py / test.py so they skip dist.init() and the
#   mmcv MMDistributedDataParallel wrapper (incompatible with newer PyTorch).
TORCHPACK="<BEV_ENV>/bin/python"
# usage:
#   $TORCHPACK tools/train.py --no-dist <config> --run-dir ...
#   $TORCHPACK tools/test.py  --no-dist <config> <ckpt> --eval bbox

# Option B — multi-GPU distributed via torchpack dist-run (needs OpenMPI:
#   `apt install openmpi-bin`). torchpack's default mpirun flags are OpenMPI
#   syntax and will fail under Intel MPI. Drop --no-dist when using this.
TORCHPACK="<BEV_ENV>/bin/torchpack dist-run -np <NGPU> <BEV_ENV>/bin/python"

All $TORCHPACK tools/train.py --no-dist ... commands shown below default to Option A (single-GPU). For multi-GPU, switch to Option B’s TORCHPACK and remove --no-dist.

1.2 spconvEnv — standalone OV inference + Pipeline B INT8 quantization#

OV_DIR=<OPENVINO_ROOT>/bin/intel64/Release
export PYTHONPATH="$OV_DIR/python:${PYTHONPATH:-}"
export LD_LIBRARY_PATH="$OV_DIR:${LD_LIBRARY_PATH:-}"
OV_PYTHON=<SPCONV_ENV>/bin/python

The OpenVINO build above must include the opset15 registration patch for BevPoolV2 / SparseConvolution / SparseToDense. Without this patch Pipeline B’s saved INT8 IR cannot be read back.

spconvEnv is used for:

  • Pipeline B unified quantization (export/quantize_unified.py, NNCF 3.1 + custom OV, §5.2.6).

  • Standalone OpenVINO-side inference tests (tools/bevfusion_standalone_ov_inference.py, §5.3).

It is not used for Pipeline A (export/pointpillars/quantize_all.py) — that stays in bevEnv.

Intel Arc B580 is the reference GPU (exposed as GPU.1).


2. Dataset Preparation#

Both pipelines read from <REPO>/training/data/. Two datasets are supported:

Dataset

Source

Final on-disk layout under data/

DAIR-V2X-I

DAIR-V2X-I Google Drive bundle (single-infrastructure-side-image/, single-infrastructure-side-velodyne/, single-infrastructure-side-label/, data_info.json)

data/dair-v2x-i/ (native) + data/dair-v2x-i-kitti/ (KITTI-format mirror used by the evaluator)

KITTI

KITTI 3D Object raw download (training/{image_2,velodyne,calib,label_2}, testing/{image_2,velodyne,calib}, ImageSets/{train,val}.txt)

data/kitti-v2x/ (V2X-format mirror produced by tools/convert_kitti_to_v2x_format.py)

Configs reference these paths through dataset_root / dataset_kitti_root (configs/V2X-I/default.yaml:2-3, configs/Kitti/default.yaml:2-3); if you put the data anywhere else, override those two keys instead of changing the configs.

2.1 DAIR-V2X-I#

Download the DAIR-V2X-I bundle from this Google Drive folder, then follow BEVHeight’s data preparation document end-to-end for the KITTI-format conversion and pkl generation steps: ADLab-AutoDrive/BEVHeight — docs/prepare_dataset.md. That guide produces the dair_12hz_infos_{train,val}.pkl annotation files this repo loads.

Final layout (this is what data/dair-v2x-i/ and data/dair-v2x-i-kitti/ should look like after BEVHeight’s prep is done):

data/
├── dair-v2x-i/                         # native DAIR-V2X-I
│   ├── velodyne/                       # *.pcd / *.bin point clouds
│   ├── image/                          # *.jpg camera frames
│   ├── calib/                          # virtuallidar↔camera + camera intrinsics JSONs
│   ├── label/                          # native JSON labels
│   ├── data_info.json                  # DAIR-V2X-I sample manifest
│   ├── dair_12hz_infos_train.pkl       # produced by BEVHeight prep
│   └── dair_12hz_infos_val.pkl         # produced by BEVHeight prep
└── dair-v2x-i-kitti/                   # KITTI-format mirror (evaluator uses this)
    ├── training/
    │   ├── calib/                      # KITTI-style calib *.txt
    │   ├── image_2/                    # KITTI-style images
    │   ├── label_2/                    # KITTI-style labels (referenced as dataset_kitti_root)
    │   └── velodyne/                   # KITTI-style point clouds
    ├── testing/                        # placeholder (empty — DAIR-V2X-I single-side has no public test split)
    └── ImageSets/
        ├── train.txt
        ├── val.txt
        ├── trainval.txt
        └── test.txt                    # placeholder, paired with empty testing/

Sanity check after prep:

ls data/dair-v2x-i/dair_12hz_infos_{train,val}.pkl   # both must exist
ls data/dair-v2x-i-kitti/training/label_2 | head -3  # KITTI-style label txts

2.2 KITTI#

KITTI prep is a two-stage flow: standard MMDetection3D infos generation, then tools/convert_kitti_to_v2x_format.py to produce a V2XDataset-compatible mirror.

2.2.1 Stage 1 — generate kitti_infos_*.pkl with MMDetection3D v1.x#

This repo’s tools/create_data.py only handles nuScenes; the kitti_infos_*.pkl files come from upstream MMDetection3D v1.x (tools/create_data.py kitti). After downloading the KITTI 3D Object archive into <KITTI_RAW>/ with the standard layout:

<KITTI_RAW>/
├── ImageSets/{train,val,trainval,test}.txt
├── training/{calib,image_2,label_2,velodyne}/
└── testing/{calib,image_2,velodyne}/

run upstream MMDetection3D v1.x’s KITTI converter (in any clone of open-mmlab/mmdetection3d at a v1.x tag — the converter is independent of this repo):

# in an mmdet3d v1.x checkout
python tools/create_data.py kitti \
    --root-path ./data/kitti \
    --out-dir ./data/kitti \
    --extra-tag kitti \
    --with-plane

--with-plane consumes the planes/ road-plane txts that ship with KITTI 3D Object and bakes per-frame ground-plane info into kitti_infos_*.pkl — needed if your downstream training uses ground-plane augmentation. Drop the flag if your <KITTI_RAW>/training/ does not contain planes/.

Substitute ./data/kitti with <KITTI_RAW> (an absolute path or a path relative to the mmdet3d checkout).

You should end up with the layout the user-side mmdet3d run produces (matches what’s already on the reference machine):

<KITTI_RAW>/
├── gt_database/
├── kitti_dbinfos_train.pkl
├── kitti_gt_database/
├── kitti_infos_test.pkl
├── kitti_infos_train.pkl
├── kitti_infos_trainval.pkl
├── kitti_infos_val.pkl
├── testing/
└── training/

The kitti_infos_{train,val,trainval,test}.pkl files are the ones consumed by Stage 2. gt_database/ and kitti_dbinfos_train.pkl are not needed for training in this repo.

2.2.2 Stage 2 — convert to V2XDataset format#

Use this repo’s converter to build the V2X-format mirror under data/kitti-v2x/:

cd <REPO>/training
$PYTHON tools/convert_kitti_to_v2x_format.py \
    --src-root <KITTI_RAW> \
    --dst-root data/kitti-v2x

The script (see tools/convert_kitti_to_v2x_format.py):

  1. Symlinks training/{image_2,velodyne,label_2,calib} and top-level image/, velodyne/ from <KITTI_RAW> into data/kitti-v2x/.

  2. Generates data/kitti-v2x/calib/virtuallidar_to_camera/<frame>.json from each KITTI calib/<frame>.txt (computes lidar2cam = R0_rect @ Tr_velo_to_cam).

  3. Rewrites each kitti_infos_<split>.pkl from MMDetection3D v1.x’s {metainfo, data_list} schema into the flat list-of-dict V2XDataset schema (KITTI cam-rect bbox → lidar-frame center + nuScenes Box convention (w, l, h) + yaw_lidar; KITTI category → nuScenes category).

Final layout:

data/
└── kitti-v2x/
    ├── image/                          → <KITTI_RAW>/training/image_2 (symlink)
    ├── velodyne/                       → <KITTI_RAW>/training/velodyne (symlink)
    ├── calib/
    │   └── virtuallidar_to_camera/
    │       ├── 000000.json
    │       ├── 000001.json
    │       └── ...
    ├── training/
    │   ├── image_2/                    → <KITTI_RAW>/training/image_2 (symlink)
    │   ├── velodyne/                   → <KITTI_RAW>/training/velodyne (symlink)
    │   ├── label_2/                    → <KITTI_RAW>/training/label_2 (symlink)
    │   └── calib/                      → <KITTI_RAW>/training/calib (symlink)
    ├── testing/
    │   ├── image_2/                    → <KITTI_RAW>/testing/image_2 (symlink)
    │   ├── velodyne/                   → <KITTI_RAW>/testing/velodyne (symlink)
    │   └── calib/                      → <KITTI_RAW>/testing/calib (symlink)
    ├── kitti_infos_train.pkl           # converted from <KITTI_RAW>/kitti_infos_train.pkl
    ├── kitti_infos_val.pkl
    ├── kitti_infos_trainval.pkl        # only if Stage 1 produced it
    └── kitti_infos_test.pkl            # only if Stage 1 produced it

Sanity check:

ls data/kitti-v2x/kitti_infos_{train,val}.pkl
ls data/kitti-v2x/calib/virtuallidar_to_camera | wc -l    # should match #frames
ls data/kitti-v2x/training/label_2 | head -3

configs/Kitti/default.yaml already points dataset_root: data/kitti-v2x and dataset_kitti_root: data/kitti-v2x/training/label_2, so no config edits are required if you placed the output here.


3. Shared Reference#

3.1 Custom ONNX Operators#

All custom ops are registered under domain org.openvinotoolkit:

Operator

Used by

Where implemented

Description

BevPoolV2

Pipeline B ONNX (camera branch)

OpenVINO GPU plugin

Camera-to-BEV view transform using precomputed geometry

SparseConvolution

Pipeline B ONNX (lidar encoder)

OpenVINO GPU plugin

3D sparse convolution with fused BN + optional ReLU

SparseToDense

Pipeline B ONNX (lidar encoder)

OpenVINO GPU plugin

Sparse feature map → dense BEV tensor

Pipeline A has no custom ops inside ONNXbevpoolv2 / pillarscatter / voxelization / post-processing are all SYCL kernels outside the ONNX graph, so standard OpenVINO can load all 4 PP ONNXs unmodified.

3.2 Config Inheritance#

All configs follow a fixed recursive-override chain. Two encoder families live under distinct top-level neck directories (lssfpn for PointPillars, secfpn for Second):

configs/default.yaml
  └─ configs/<DATASET>/default.yaml                      # dataset, image_size, object_classes
       └─ configs/<DATASET>/det/default.yaml             # detection model type (BEVFusion)
            └─ configs/<DATASET>/det/centerhead/default.yaml
                 ├─ .../lssfpn/default.yaml              # ← Pipeline A (PointPillars)
                 │    └─ .../camera+pointpillar/default.yaml
                 │         └─ .../resnet34/default.yaml
                 └─ .../secfpn/default.yaml              # ← Pipeline B (Second)
                      └─ .../camera+lidar/default.yaml
                           └─ .../resnet34/default.yaml   # (+ optional bevpoolv2.yaml)

<DATASET>V2X-I, Kitti, nuscenes. Backbone variants under each leaf directory: resnet34, resnet50, fasternet.


4. Pipeline A — PointPillars-based BEVFusion (4 ONNX)#

4.1 Design#

Four independent ONNX files, each an independently quantizable / replaceable deploy stage:

Stage

ONNX file

Content

I/O

camera

camera.backbone.onnx

ResNet34 → GeneralizedLSSFPN → DepthNet

img [1,1,3,864,1536]camera_feature, camera_depth_weights

lidar PFE

lidar_pfe.onnx / lidar_pfe_v7000.onnx

PillarFeatureNet (f_cluster / f_center / mask fused into the graph)

features [V,100,4], num_voxels [V], coors [V,4]pillar_features [V,64]

fuser

fuser.onnx

ConvFuser + decoder.backbone + decoder.neck

cam_bev [1,80,128,128], lidar_bev [1,64,128,128]middle [1,256,128,128]

head

head.onnx

CenterHead (shared_conv + task_heads, no decoder)

middle [1,256,128,128] → 12 task tensors

Stays outside ONNX (SYCL kernels in deploy):

  • Voxelization — deploy/src/pointpillars/voxelizer.cpp

  • bevpoolv2 (camera BEV pooling) — deploy SYCL kernel

  • PointPillarsScatter (PFE output → dense BEV canvas) — deploy SYCL kernel

  • CenterHead post-processing (heatmap top-k, box decode, rotate-NMS) — deploy SYCL kernel

The voxelizer’s coors layout is (batch_idx, x_idx, y_idx, z_idx)opposite to Pipeline B’s voxelizer layout (batch, z, y, x). The deploy-side SYCL voxelizers must follow each pipeline’s own layout; the two cannot be shared.

4.2 Generic Workflow#

Throughout §4.2 we use placeholder variables; §4.3 / §4.4 fill them in per dataset:

PP_CONFIG=<path to a camera+pointpillar/resnet34/default.yaml>
PP_CKPT=<path to the trained pth>

4.2.1 Training#

$TORCHPACK tools/train.py --no-dist $PP_CONFIG --mode dense --run-dir ./work_dirs/<dataset>/pp/

The BEVFusion base model in mmdet3d/models/fusion_models/bevfusion.py auto-selects hard-voxelize (PointPillars) vs DynamicScatter (Second) based on max_num_points > 0 — no training-code changes are needed to switch encoders.

4.2.2 Inference & Visualization#

$PYTHON tools/inference_vis.py $PP_CONFIG $PP_CKPT \
    --split train --mode pred --bbox-score 0.3 --out-dir viz_pp

Argument

Description

Default

--mode

gt or pred

gt

--split

train or val

val

--bbox-score

score threshold

None

--out-dir

output directory

viz

Outputs go to <out-dir>/camera/*.png and <out-dir>/lidar/*.png. The script is encoder-agnostic — any model that emits boxes_3d / scores_3d / labels_3d works.

4.2.3 ONNX Export — All 4 Files in One Call#

$PYTHON export/pointpillars/export_all.py \
    --config $PP_CONFIG --ckpt $PP_CKPT \
    --out-dir export/onnx/pointpillars

Reference output sizes (V2X-I):

File

Size

Nodes

Notes

camera.backbone.onnx

88 MB

113

ResNet34 + LSSFPN + DepthNet

lidar_pfe.onnx

17 KB

198

dynamic V (up to 12000)

fuser.onnx

405 KB

51

ConvFuser + decoder.backbone + decoder.neck → middle[1,256,128,128]

head.onnx

52 MB

38

CenterHead only, consumes middle

Individual exports (if you only want one stage):

$PYTHON export/pointpillars/export-camera.py --config $PP_CONFIG --ckpt $PP_CKPT -o export/onnx/pointpillars/camera.backbone.onnx
$PYTHON export/pointpillars/export-lidar.py  --config $PP_CONFIG --ckpt $PP_CKPT -o export/onnx/pointpillars/lidar_pfe.onnx
$PYTHON export/pointpillars/export-fuser.py  --config $PP_CONFIG --ckpt $PP_CKPT -o export/onnx/pointpillars/fuser.onnx
$PYTHON export/pointpillars/export-head.py   --config $PP_CONFIG --ckpt $PP_CKPT -o export/onnx/pointpillars/head.onnx

4.2.5 INT8 Quantization — All 4 ONNXs in One Call#

Produces quantized_camera.xml / quantized_lidar_pfe.xml / quantized_fuser.xml / quantized_head.xml via NNCF PTQ on 300 calibration frames.

$PYTHON export/pointpillars/quantize_all.py \
    --config $PP_CONFIG --ckpt $PP_CKPT \
    --onnx-dir export/onnx/pointpillars \
    --out-dir  export/onnx/pointpillars \
    --num-samples 300

quantize_all.py prefers lidar_pfe_v7000.onnx when both static and dynamic PFE ONNX are present.

Individual stages (all share the same CLI shape):

$PYTHON export/pointpillars/quantize_camera_backbone.py --config $PP_CONFIG --ckpt $PP_CKPT \
    --onnx export/onnx/pointpillars/camera.backbone.onnx \
    --out  export/onnx/pointpillars/quantized_camera.xml --num-samples 300
$PYTHON export/pointpillars/quantize_lidar_pfe.py --config $PP_CONFIG --ckpt $PP_CKPT \
    --onnx export/onnx/pointpillars/lidar_pfe_v7000.onnx \
    --out  export/onnx/pointpillars/quantized_lidar_pfe.xml --num-samples 300
$PYTHON export/pointpillars/quantize_fuser.py --config $PP_CONFIG --ckpt $PP_CKPT \
    --onnx export/onnx/pointpillars/fuser.onnx \
    --out  export/onnx/pointpillars/quantized_fuser.xml --num-samples 300
$PYTHON export/pointpillars/quantize_head.py --config $PP_CONFIG --ckpt $PP_CKPT \
    --onnx export/onnx/pointpillars/head.onnx \
    --out  export/onnx/pointpillars/quantized_head.xml --num-samples 300

fuser/head splitfuser.onnx contains decoder backbone+neck, and head.onnx contains CenterHead shared conv + task heads.

4.2.6 End-to-End Command Sequence#

cd <REPO>/training

# 1) Train
$TORCHPACK tools/train.py --no-dist $PP_CONFIG --mode dense --run-dir ./work_dirs/<dataset>/pp/

# 2) Inference sanity check
$PYTHON tools/inference_vis.py $PP_CONFIG $PP_CKPT --split train --mode pred

# 3) Export 4 ONNX files
$PYTHON export/pointpillars/export_all.py \
    --config $PP_CONFIG --ckpt $PP_CKPT --out-dir export/onnx/pointpillars

# 4) Static-V PFE for deploy INT8 (V=7000 matches deploy's max_voxels)
$PYTHON export/pointpillars/export-lidar.py \
    --config $PP_CONFIG --ckpt $PP_CKPT --fixed-v 7000 --split val \
    -o export/onnx/pointpillars/lidar_pfe_v7000.onnx

# 5) INT8 quantization (all 4 stages) — auto picks lidar_pfe_v7000.onnx
$PYTHON export/pointpillars/quantize_all.py \
    --config $PP_CONFIG --ckpt $PP_CKPT \
    --onnx-dir export/onnx/pointpillars --out-dir export/onnx/pointpillars \
    --num-samples 300

4.2.7 Deploy Repo Runtime (./bevfusion)#

The deploy-side split-pipeline runner is ./bevfusion in the sibling repo <REPO>. It loads the 4 split models from deploy/data/<preset_dir>/pointpillars/:

camera.backbone.onnx  / quantized_camera.xml
lidar_pfe.onnx        / lidar_pfe_v7000.onnx / quantized_lidar_pfe.xml
fuser.onnx            / quantized_fuser.xml
head.onnx             / quantized_head.xml

Copy the artifacts from export/onnx/pointpillars/ to deploy/data/<preset_dir>/pointpillars/ before running — the deploy repo does not read from the training tree. <preset_dir> is v2xfusion for DAIR-V2X-I and kitti for KITTI (see deploy/src/pipeline/dataset_preset.cpp for the full preset geometry table). The --preset flag passed to the binary is v2x or kitti.

cd <REPO>/deploy/build

# FP32 (loads the 4 .onnx files; PFE prefers v7000 when present, else dynamic).
# Preset defaults to v2x when --preset is omitted.
./bevfusion <DATASET_PATH> --num-samples 30 --vis                                # V2X-I FP32
./bevfusion <DATASET_PATH> --preset kitti --num-samples 30 --vis                 # KITTI  FP32

# INT8 (loads the 4 quantized_*.xml files; PFE pinned to V=7000)
./bevfusion <DATASET_PATH> --num-samples 30 --vis --int8                         # V2X-I INT8
./bevfusion <DATASET_PATH> --preset kitti --num-samples 30 --vis --int8          # KITTI  INT8

# Per-stage INT8 toggles
./bevfusion ... --int8-camera --int8-pfe --int8-fuser --int8-head

Flags:

  • --preset v2x / --preset kitti selects both geometry (image size, BEV grid, pc_range, out_size_factor) and the model dir.

  • --int8 turns on INT8 for all 4 stages; individual toggles let you mix.

  • --dump-pred --pred-dir DIR writes KITTI-format box txts for offline metric eval.

  • --vis writes bevfusion.mp4 into the build dir.

4.3 Dataset: V2X-I (DAIR-V2X-I)#

PP_CONFIG=configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml
PP_CKPT=work_dirs/V2X-I/pp/latest.pth
PP_ONNX_DIR=export/onnx/pointpillars
DEPLOY_DIR=<REPO>/deploy/data/v2xfusion/pointpillars

For deploy consistency, use static-V with --fixed-v 7000 in the PFE export step.

End-to-end commands:

cd <REPO>/training

# 1) Train
$TORCHPACK tools/train.py --no-dist $PP_CONFIG --mode dense --run-dir ./work_dirs/V2X-I/pp

# 2) Inference sanity check
$PYTHON tools/inference_vis.py $PP_CONFIG $PP_CKPT --split val --mode pred

# 3) Export 4 ONNX
$PYTHON export/pointpillars/export_all.py \
    --config $PP_CONFIG --ckpt $PP_CKPT --out-dir $PP_ONNX_DIR

# 4) Static-V PFE for INT8 deploy
$PYTHON export/pointpillars/export-lidar.py \
    --config $PP_CONFIG --ckpt $PP_CKPT --fixed-v 7000 --split val \
    -o $PP_ONNX_DIR/lidar_pfe_v7000.onnx

# 5) INT8 quantization (auto picks v7000)
$PYTHON export/pointpillars/quantize_all.py \
    --config $PP_CONFIG --ckpt $PP_CKPT \
    --onnx-dir $PP_ONNX_DIR --out-dir $PP_ONNX_DIR --num-samples 300

# 6) Publish to deploy
cp $PP_ONNX_DIR/{camera.backbone,fuser,head,lidar_pfe,lidar_pfe_v7000}.onnx "$DEPLOY_DIR/"
cp $PP_ONNX_DIR/quantized_{camera,lidar_pfe,fuser,head}.{xml,bin} "$DEPLOY_DIR/"

# 7) Deploy runtime (V2X-I — preset defaults to v2x; FP32 omits --int8)
cd <REPO>/deploy/build
./bevfusion <REPO>/training/data/dair-v2x-i-kitti/training --num-samples 1000           # FP32
./bevfusion <REPO>/training/data/dair-v2x-i-kitti/training --num-samples 1000 --int8    # INT8

4.4 Dataset: KITTI#

PP_CONFIG=configs/Kitti/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml
PP_CKPT=work_dirs/Kitti/pp/latest.pth
PP_ONNX_DIR=export/onnx/pointpillars/kitti
DEPLOY_DIR=<REPO>/deploy/data/kitti/pointpillars

Backbone variants under configs/Kitti/det/centerhead/lssfpn/camera+pointpillar/: {default,resnet34,resnet50,fasternet}/. Dataset-scoped ONNX output dir (export/onnx/pointpillars/kitti/) keeps KITTI artifacts from colliding with V2X-I’s.

Geometry differences vs V2X-I (from deploy/src/pipeline/dataset_preset.cpp):

KITTI

V2X-I

image (W×H)

1280×384

1536×864

camera feat (W×H)

80×24

96×54

BEV grid

100×100

128×128

pc_range

[0,-40,-5]→[80,40,3]

[0,-51.2,-5]→[102.4,51.2,3]

post_center_range

[0,-45,-5]→[85,45,3]

same as pc_range

split_post_voxel_size

0.1

0.2

out_size_factor

8

4

KITTI-specific note:

  • Always pass --config and --ckpt explicitly to quantization/export scripts.

End-to-end commands:

cd <REPO>/training

# 1) Train
$TORCHPACK tools/train.py --no-dist $PP_CONFIG --mode dense --run-dir ./work_dirs/Kitti/pp

# 2) Inference sanity check
$PYTHON tools/inference_vis.py $PP_CONFIG $PP_CKPT --split val --mode pred

# 3) Export 4 ONNX
$PYTHON export/pointpillars/export_all.py \
    --config $PP_CONFIG --ckpt $PP_CKPT --out-dir $PP_ONNX_DIR

# 4) Static-V PFE for INT8 deploy (V=7000)
$PYTHON export/pointpillars/export-lidar.py \
    --config $PP_CONFIG --ckpt $PP_CKPT --fixed-v 7000 --split val \
    -o $PP_ONNX_DIR/lidar_pfe_v7000.onnx

# 5) INT8 quantization (all 4 stages; quantize_all auto picks v7000)
$PYTHON export/pointpillars/quantize_all.py \
    --config $PP_CONFIG --ckpt $PP_CKPT \
    --onnx-dir $PP_ONNX_DIR --out-dir $PP_ONNX_DIR --num-samples 300

# 6) Publish to deploy
cp $PP_ONNX_DIR/{camera.backbone,fuser,head,lidar_pfe,lidar_pfe_v7000}.onnx "$DEPLOY_DIR/"
cp $PP_ONNX_DIR/quantized_{camera,lidar_pfe,fuser,head}.{xml,bin} "$DEPLOY_DIR/"

# 7) Deploy runtime (KITTI — pass --preset kitti; FP32 omits --int8)
cd <REPO>/deploy/build
./bevfusion <REPO>/training/data/kitti-v2x/training \
    --num-samples 1000 --preset kitti              # FP32
./bevfusion <REPO>/training/data/kitti-v2x/training \
    --num-samples 1000 --int8 --preset kitti       # INT8

Validation target: KITTI INT8 frame-by-frame box counts match FP32 exactly on the smoke set.


5. Pipeline B — Second-based BEVFusion (unified ONNX)#

5.1 Design#

A single bevfusion_unified.onnx merges camera + lidar + fuser + head. Unlike Pipeline A, the lidar sparse encoder contains ~21 SparseConv3d / SubMConv3d layers that:

  • are called many times, with BN and ReLU fused into each sparse-conv op;

  • require custom forward implementations that don’t map cleanly to ONNX standard ops.

These ops (plus BevPoolV2 for camera and SparseToDense at the encoder boundary) therefore live as OpenVINO GPU plugin custom ops (§2.1) rather than as SYCL kernels outside the graph. Deploy runs one OpenVINO infer call per frame and the plugin dispatches sparse kernels internally.

Deploy directories:

5.2 Generic Workflow#

Placeholder variables used throughout §5.2; §5.3–§5.4 fill them per dataset:

CP_CONFIG=<path to a camera+lidar/resnet34/{default,bevpoolv2}.yaml>
CP_CKPT=<path to the trained pth, e.g. work_dirs/<dataset>/bevpoolv2/latest.pth>

5.2.1 Training — BEVPool V1 vs V2#

BEVPool V1 (default):

CUDA_VISIBLE_DEVICES=0 $TORCHPACK tools/train.py --no-dist \
    configs/<DATASET>/det/centerhead/secfpn/camera+lidar/resnet34/default.yaml \
    --run-dir ./work_dirs/<dataset>/bevpoolv1

BEVPool V2 (recommended for deploy): add use_bevpool: bevpoolv2 under vtransform. Either create a sibling bevpoolv2.yaml:

model:
  encoders:
    camera:
      vtransform:
        use_bevpool: bevpoolv2
        depth_threshold: 0

or inline the override in the existing resnet34/default.yaml. Then:

CUDA_VISIBLE_DEVICES=0 $TORCHPACK tools/train.py --no-dist \
    configs/<DATASET>/det/centerhead/secfpn/camera+lidar/resnet34/bevpoolv2.yaml \
    --run-dir ./work_dirs/<dataset>/bevpoolv2

work_dirs convention: KITTI uses dataset-scoped subdirs (work_dirs/Kitti/bevpoolv2/). For V2X-I the same convention is work_dirs/V2X-I/bevpoolv2/.

5.2.2 Inference & Visualization#

$PYTHON tools/inference_vis.py $CP_CONFIG $CP_CKPT \
    --mode pred --out-dir viz --bbox-score 0.3

Argument

Description

Default

--mode

gt or pred

gt

--split

train or val

val

--bbox-score

min confidence

None

--bbox-classes

filter by class indices

None

--out-dir

output directory

viz

Outputs: <out-dir>/{camera,lidar,map}/.

5.2.3 Precompute BEVPool V2 Geometry#

Run this before any ONNX export — generates indices.bin + intervals.bin consumed by both the unified model and the camera sub-model:

# From a saved tensor data sample
$PYTHON export/precompute_geometry.py $CP_CONFIG $CP_CKPT \
    --data-path tools/dump/00000/example-data.pth \
    -o export/geometry

# Or from the dataset directly
$PYTHON export/precompute_geometry.py $CP_CONFIG $CP_CKPT \
    --from-dataset --split val --sample-idx 0 \
    -o export/geometry

Output (bev_latest compatible):

File

Format

Description

indices.bin

[uint32 count][count * uint32]

466560 sorted point indices (full grid, sentinel included)

intervals.bin

[uint32 count][count * int3(start, end, bev_rank)]

intervals with absolute offsets; sentinel has rank=-1

geometry.pth

PyTorch tensor dict

same data, PyTorch-native

5.2.4 Export Unified ONNX#

Exports the full BEVFusion pipeline as a single ONNX (internally merges 3 sub-models):

$PYTHON export/export_unified_onnx.py $CP_CONFIG $CP_CKPT \
    --geometry-dir export/geometry \
    -o export/onnx/bevfusion_unified.onnx

Unified model I/O:

Input

Shape

Description

img

[1, 3, 864, 1536]

camera image (NCHW)

indices

[num_points]

BEVPoolV2 sorted indices (from geometry)

intervals

[num_intervals, 3]

BEVPoolV2 intervals (start, end, bev_rank)

voxel_features

[N_vox, 4]

voxelized lidar features

voxel_indices

[N_vox, 4]

voxel coordinates (batch, z, y, x)

Older unified exports may still use 5-D img ([1, 1, 3, H, W]) — both are supported by the standalone inference script.

Output

Shape

Description

task{i}_heatmap

[1, 5, 128, 128]

per-task class heatmap

task{i}_reg

[1, 2, 128, 128]

regression offset

task{i}_height

[1, 1, 128, 128]

height

task{i}_dim

[1, 3, 128, 128]

box dimensions (l, w, h)

task{i}_rot

[1, 2, 128, 128]

rotation (sin, cos)

task{i}_vel

[1, 2, 128, 128]

velocity (vx, vy)

5.2.5 OpenVINO Inference — Unified Model#

Runs the unified ONNX with CenterHead post-processing and visualization. Requires Intel Arc GPU for SparseConvolution ops.

From dataset (tools/bevfusion_standalone_ov_inference.py) — no mmdet3d runtime dependency:

$OV_PYTHON tools/bevfusion_standalone_ov_inference.py \
    --data-root <data/...> --ann-file <...infos_val.pkl> \
    --onnx-path <bevfusion_unified[_dataset].onnx> \
    --geometry-dir <export/geometry[_dataset]> \
    --device GPU.1 --out-dir viz_standalone \
    --bbox-score 0.3 --max-samples 10

Per-dataset argument values live in §5.3–§5.4.

  • Auto-adapts img rank for both 4-D NCHW ([1,3,H,W]) and legacy 5-D ([1,1,3,H,W]) unified ONNX formats.

  • Geometry auto-detection: init_dataset_geometry_from_onnx() reads pc_range / voxel_size / sparse_shape from ONNX attributes at startup, so the same script works across datasets without manual source edits.

Argument

Description

Default

--onnx-path

unified ONNX

(required)

--geometry-dir

indices.bin + intervals.bin

(required)

--device

OpenVINO device (GPU.1 for Arc B580)

CPU

--bbox-score

score threshold

0.1

--max-samples

frames to process

None

5.2.6 INT8 Quantization — Unified Model#

Produces bevfusion_unified_int8.xml/.bin from the FP32 unified ONNX via NNCF PTQ.

Prerequisites:

  • Custom OpenVINO build with the opset15 patch (§1.2) — without it the saved IR can’t be read back.

  • Two envs, no mixing: bevEnv for the offline voxelizer dump; spconvEnv (py3.12 + custom OV + NNCF 3.1) for the actual quantization.

  • $CP_CKPT + matching bevpoolv2.yaml — the bevpoolv1 and bevpoolv2 checkpoints are not interchangeable.

  • export/geometry/indices.bin + intervals.bin (§5.2.3).

  • export/onnx/bevfusion_unified.onnx (§5.2.4).

  • export/dump_voxels.py now reads directly from cfg.data.<split> (no dependency on tools/dump/*/example-data.pth).

One-time NNCF install:

<SPCONV_ENV>/bin/pip install "nncf==3.1.0"

Three-stage pipeline:

cd <REPO>/training

# Stage 1 — voxelizer dump (bevEnv, needs mmdet3d/spconv)
$PYTHON export/dump_voxels.py $CP_CONFIG $CP_CKPT \
    -o export/calib_voxels --num-frames 400 --split val

# Stage 2 — pseudo-GT from FP32 self-distillation (spconvEnv)
#   Quantization validation is self-distilled: FP32 decoded boxes are used as pseudo-GT.
rm -f export/calib_voxels/_pseudo_gt.npz
$OV_PYTHON -u export/quantize_unified.py \
    --stage pseudo-gt --n-calib 300 --n-val 100 \
    --model-fp32 export/onnx/bevfusion_unified.onnx \
    --geo-dir export/geometry \
    --calib-dir export/calib_voxels \
    --pseudo-gt-cache export/calib_voxels/_pseudo_gt.npz

# Stage 3 — NNCF PTQ (spconvEnv)
#   Default output: <REPO>/deploy/data/v2xfusion/onnx/bevfusion_unified_int8.xml/.bin
$OV_PYTHON -u export/quantize_unified.py \
    --stage quantize --n-calib 300 --n-val 100 --plain-ptq \
    --preset mixed --activation-range histogram \
    --model-fp32 export/onnx/bevfusion_unified.onnx \
    --geo-dir export/geometry \
    --calib-dir export/calib_voxels \
    --pseudo-gt-cache export/calib_voxels/_pseudo_gt.npz

Recommended flags:

Flag

Value

Reason

--preset

mixed

Recommended default preset.

--activation-range

histogram

Recommended default activation range.

--plain-ptq

Use plain PTQ mode.

--n-calib / --n-val

300 / 100

Default.

FP-only custom ops are handled automatically by quantize_unified.py.

Outputs:

File

Notes

bevfusion_unified.onnx

FP32 source

bevfusion_unified_int8.xml

INT8 IR topology

bevfusion_unified_int8.bin

INT8 weights

Use the default --activation-range histogram unless you have validated alternatives on your target deployment.

5.2.7 End-to-End Command Sequence#

For release workflows, use the dataset-specific end-to-end command blocks in §5.3 (V2X-I) and §5.4 (KITTI).

5.3 Dataset: V2X-I (DAIR-V2X-I)#

CP_CONFIG=configs/V2X-I/det/centerhead/secfpn/camera+lidar/resnet34/bevpoolv2.yaml
CP_CKPT=work_dirs/V2X-I/bevpoolv2/latest.pth
GEO_DIR=export/geometry
CALIB_DIR=export/calib_voxels
UNIFIED_ONNX=export/onnx/bevfusion_unified.onnx
DEPLOY_V2X_DIR=<REPO>/deploy/data/v2xfusion/second

Deploy artifacts are stored in deploy/data/v2xfusion/second/: bevfusion_unified_fp16.onnx and bevfusion_unified_int8.xml/.bin.

End-to-end commands:

cd <REPO>/training

# 1) Train (BEVPoolV2 recommended, see §5.2.1)
$TORCHPACK tools/train.py --no-dist $CP_CONFIG --run-dir ./work_dirs/V2X-I/bevpoolv2

# 2) Validate
$PYTHON tools/inference_vis.py $CP_CONFIG $CP_CKPT --split val --mode pred

# 3) Precompute BEVPoolV2 geometry
$PYTHON export/precompute_geometry.py $CP_CONFIG $CP_CKPT \
    --from-dataset --split val --sample-idx 0 -o $GEO_DIR

# 4) Export unified ONNX
$PYTHON export/export_unified_onnx.py $CP_CONFIG $CP_CKPT \
    --geometry-dir $GEO_DIR -o $UNIFIED_ONNX

# 5) Calibration voxel dump (dataset-driven)
$PYTHON export/dump_voxels.py $CP_CONFIG $CP_CKPT \
    -o $CALIB_DIR --num-frames 400 --split val

# 6) INT8 quantize (3-stage, see §5.2.6)
rm -f $CALIB_DIR/_pseudo_gt.npz
$OV_PYTHON -u export/quantize_unified.py --stage pseudo-gt \
    --n-calib 300 --n-val 100 \
    --model-fp32 $UNIFIED_ONNX --geo-dir $GEO_DIR \
    --calib-dir $CALIB_DIR --pseudo-gt-cache $CALIB_DIR/_pseudo_gt.npz
$OV_PYTHON -u export/quantize_unified.py --stage quantize \
    --n-calib 300 --n-val 100 --plain-ptq \
    --preset mixed --activation-range histogram \
    --model-fp32 $UNIFIED_ONNX --geo-dir $GEO_DIR \
    --calib-dir $CALIB_DIR --pseudo-gt-cache $CALIB_DIR/_pseudo_gt.npz \
    --output $DEPLOY_V2X_DIR/bevfusion_unified_int8.xml

# 7) Publish deploy artifacts (FP16 ONNX is exported with --fp16 from
#    export/export_unified_onnx.py; INT8 .xml/.bin are produced by step 6)
cp export/onnx/bevfusion_unified_fp16.onnx     $DEPLOY_V2X_DIR/
# (the INT8 IR was already written into $DEPLOY_V2X_DIR by --output above)

# 8) Deploy runtime — preset defaults to v2x; INT8 is the default model
cd <REPO>/deploy/build
LD_LIBRARY_PATH=<OPENVINO_ROOT>/bin/intel64/Release:${LD_LIBRARY_PATH:-} \
./bevfusion_unified <REPO>/training/data/dair-v2x-i-kitti/training \
    --num-samples 1000                                              # INT8 (default)

LD_LIBRARY_PATH=<OPENVINO_ROOT>/bin/intel64/Release:${LD_LIBRARY_PATH:-} \
./bevfusion_unified <REPO>/training/data/dair-v2x-i-kitti/training \
    --num-samples 1000 --fp16                                       # FP16

Standalone OV inference (tools/bevfusion_standalone_ov_inference.py) is useful for Python-side debugging without rebuilding the deploy binary; it defaults --ann-file to <data-root>/dair_12hz_infos_val.pkl:

$OV_PYTHON tools/bevfusion_standalone_ov_inference.py \
    --data-root data/dair-v2x-i \
    --onnx-path $UNIFIED_ONNX --geometry-dir $GEO_DIR \
    --device GPU.1 --out-dir viz_standalone \
    --bbox-score 0.3 --max-samples 10

5.4 Dataset: KITTI#

CP_CONFIG=configs/Kitti/det/centerhead/secfpn/camera+lidar/resnet34/bevpoolv2.yaml
CP_CKPT=work_dirs/Kitti/bevpoolv2/latest.pth
GEO_DIR=export/geometry_kitti
CALIB_DIR=export/calib_voxels_kitti
UNIFIED_ONNX=export/bevfusion_unified_kitti.onnx
DEPLOY_KITTI_DIR=<REPO>/deploy/data/kitti/second

Per-dataset paths (export/geometry_kitti, export/calib_voxels_kitti, etc.) keep KITTI artifacts from overwriting V2X-I’s. The deploy KITTI second-based dir holds the same two model variants as V2X-I: bevfusion_unified_fp16.onnx + bevfusion_unified_int8.xml/.bin. The unified pipeline auto-detects pc_range / voxel_size from the ONNX; the only deploy-side switch needed is --preset kitti.

For KITTI quantization, always pass --config $CP_CONFIG to quantize_unified.py.

End-to-end commands:

cd <REPO>/training

# 1) Train
$TORCHPACK tools/train.py --no-dist $CP_CONFIG --run-dir ./work_dirs/Kitti/bevpoolv2

# 2) Validate
$PYTHON tools/inference_vis.py $CP_CONFIG $CP_CKPT --split val --mode pred

# 3) Precompute BEVPoolV2 geometry (KITTI-specific dir)
$PYTHON export/precompute_geometry.py $CP_CONFIG $CP_CKPT \
    --from-dataset --split val --sample-idx 0 -o $GEO_DIR

# 4) Export unified ONNX (KITTI-specific name)
$PYTHON export/export_unified_onnx.py $CP_CONFIG $CP_CKPT \
    --geometry-dir $GEO_DIR -o $UNIFIED_ONNX

# 5) Calibration voxel dump
$PYTHON export/dump_voxels.py $CP_CONFIG $CP_CKPT \
    -o $CALIB_DIR --num-frames 400 --split val

# 6) INT8 quantize (KITTI MUST pass --config; see note above)
rm -f $CALIB_DIR/_pseudo_gt.npz
$OV_PYTHON -u export/quantize_unified.py --stage pseudo-gt \
    --config $CP_CONFIG \
    --n-calib 300 --n-val 100 \
    --model-fp32 $UNIFIED_ONNX --geo-dir $GEO_DIR \
    --calib-dir $CALIB_DIR --pseudo-gt-cache $CALIB_DIR/_pseudo_gt.npz
$OV_PYTHON -u export/quantize_unified.py --stage quantize \
    --config $CP_CONFIG \
    --n-calib 300 --n-val 100 --plain-ptq \
    --preset mixed --activation-range histogram \
    --model-fp32 $UNIFIED_ONNX --geo-dir $GEO_DIR \
    --calib-dir $CALIB_DIR --pseudo-gt-cache $CALIB_DIR/_pseudo_gt.npz \
    --output $DEPLOY_KITTI_DIR/bevfusion_unified_int8.xml

# 7) Publish FP16 source to deploy
cp export/onnx/bevfusion_unified_kitti_fp16.onnx \
    $DEPLOY_KITTI_DIR/bevfusion_unified_fp16.onnx

# 8) Deploy runtime — pass --preset kitti; INT8 is the default model
cd <REPO>/deploy/build
LD_LIBRARY_PATH=<OPENVINO_ROOT>/bin/intel64/Release:${LD_LIBRARY_PATH:-} \
./bevfusion_unified <REPO>/training/data/kitti-v2x/training \
    --num-samples 1000 --preset kitti                                # INT8 (default)

LD_LIBRARY_PATH=<OPENVINO_ROOT>/bin/intel64/Release:${LD_LIBRARY_PATH:-} \
./bevfusion_unified <REPO>/training/data/kitti-v2x/training \
    --num-samples 1000 --preset kitti --fp16                         # FP16

The .bin sibling file is auto-produced beside the .xml. Standalone Python-side inference for offline debugging:

OV_ROOT=<OPENVINO_ROOT>/bin/intel64/Release \
PYTHONPATH=$OV_ROOT/python:$PYTHONPATH \
LD_LIBRARY_PATH=$OV_ROOT:$LD_LIBRARY_PATH \
<SPCONV_ENV>/bin/python tools/bevfusion_standalone_ov_inference.py \
    --data-root data/kitti-v2x \
    --ann-file data/kitti-v2x/kitti_infos_val.pkl \
    --onnx-path $UNIFIED_ONNX --geometry-dir $GEO_DIR \
    --device GPU.1 --out-dir viz_standalone \
    --bbox-score 0.5 --max-samples 100

For the split (PP) pipeline on KITTI — which is what ./bevfusion --preset kitti --int8 runs — see §4.4. The two paths are independent: ./bevfusion./bevfusion_unified.