Get Started with Intermediate Fusion#

This guide covers two workflows:

  • Recommended: run the published Docker image.

  • Optional: build and run the project natively on the host.

2. Prepare The Host For A Native Build#

bash install_driver_related_libs.sh
bash install_project_related_libs.sh

See Prerequisites for the full environment requirements, package versions, and optional environment variables.

3. Build The Project#

Recommended build command:

bash build.sh

Equivalent manual flow:

source /opt/intel/oneapi/setvars.sh
source /opt/intel/openvino/setupvars.sh

mkdir -p build
cd build
cmake ..
cmake --build . --parallel "$(nproc)"

4. Run bevfusion#

From build/:

./bevfusion <dataset_path> [--preset v2x|kitti] [--model-dir DIR] \
    [--fp16] [--int8] [--int8-camera] [--int8-pfe] [--int8-fuser] [--int8-head] \
    [--vis] [--save-image] [--save-video] [--display] [--util] \
    [--repeat N] [--num-samples N] [--dump-pred] [--pred-dir DIR] \
    [--vis-dir DIR] [--device DEVICE] \
    [--bbox-score SCORE] [--filter-labels NAME,...] [--no-filter]

Important options:

  • <dataset_path>: KITTI-style dataset root.

  • --preset v2x|kitti: select DAIR-V2X or KITTI-360 geometry and post-process settings.

  • --model-dir DIR: override the default split-model directory. The default is ../data/v2xfusion/pointpillars for v2x and ../data/kitti/pointpillars for kitti.

  • Default behavior requests all available INT8 component models. On Battlemage GPUs, the split pipeline uses fuser.onnx instead of quantized_fuser.xml for the known INT8 fuser issue.

  • PFE selection uses quantized_lidar_pfe.xml for INT8. --fp16 runs prefer lidar_pfe_v7000.onnx and fall back to lidar_pfe_v6000.onnx when v7000 is not present.

  • --fp16: switch all split components to the non-quantized ONNX models and run them with FP16 inference.

  • --int8: explicitly use all available INT8 component models.

  • --repeat N: run the dataset multiple times.

  • --num-samples N: limit the run to the first N discovered samples.

  • --bbox-score SCORE: override the detection score threshold (default: 0.1 for V2X, 0.5 for KITTI). Only boxes with score >= SCORE are output.

  • --dump-pred --pred-dir DIR: export KITTI-format predictions.

  • --save-image, --save-video, --display: enable visualization outputs.

Example commands:

./bevfusion ../data/v2xfusion/dataset
./bevfusion ../data/v2xfusion/dataset --fp16
./bevfusion ../data/kitti/dataset --preset kitti
./bevfusion ../data/kitti/dataset --preset kitti --fp16

With the bundled release assets, a successful runtime smoke run ends with output similar to:

Detected 0 boxes
[perf] frames=..., avg_lidar=... ms, avg_camera_bev=... ms, avg_fusion+post=... ms, avg_total=... ms

5. Run bevfusion_unified#

From build/:

./bevfusion_unified <dataset_path> [--preset v2x|kitti] [--model PATH] [--fp16] \
    [--vis] [--save-image] [--save-video] [--display] [--util] \
    [--repeat N] [--num-samples N] [--dump-pred] [--pred-dir DIR] \
    [--vis-dir DIR] [--recompute-camera-metas] [--cache-camera-metas] \
    [--bbox-score SCORE] [--filter-labels NAME,...] [--no-filter]

Important options:

  • <dataset_path>: KITTI-style dataset root.

  • --preset v2x|kitti: select DAIR-V2X or KITTI-360 geometry, voxelization range, post-process range, and camera metadata policy.

  • Default behavior loads the dataset-specific INT8 OpenVINO IR from ../data/<dataset>/second/bevfusion_unified_int8.xml.

  • --fp16: switch to the dataset-specific FP16 ONNX model at ../data/<dataset>/second/bevfusion_unified_fp16.onnx.

  • --model PATH: override the default model path. --onnx PATH remains accepted as a compatibility alias.

  • --recompute-camera-metas: recompute camera geometry for every frame.

  • --cache-camera-metas: compute camera geometry once and reuse it. This is the V2X default; KITTI defaults to per-frame recompute.

  • --bbox-score SCORE: override the detection score threshold (default: 0.1 for V2X, 0.5 for KITTI). Only boxes with score >= SCORE are output.

  • --num-samples N: limit the run to the first N discovered samples.

  • --repeat N: repeat the selected samples.

  • --dump-pred --pred-dir DIR: export KITTI-format predictions.

  • --save-image, --save-video, --display: enable visualization outputs.

Example commands:

./bevfusion_unified ../data/v2xfusion/dataset --num-samples 1
./bevfusion_unified ../data/v2xfusion/dataset --fp16 --num-samples 1
./bevfusion_unified ../data/kitti/dataset --preset kitti --num-samples 1

With the bundled release assets, a successful runtime smoke run ends with output similar to:

[info] 1 samples
    000000: 0 boxes
[perf] frames=1, avg_voxelize=4.523 ms, avg_preprocess=0.721 ms, avg_geometry=0.000 ms, avg_infer=31.422 ms, avg_postprocess=1.587 ms, avg_total=38.253 ms

6. Save Results And Visualizations#

Both main applications support the same result-export workflow:

  • --save-image: save per-frame visualization images.

  • --save-video: save a video file.

  • --display: open a live preview window when a GUI environment is available.

  • --dump-pred --pred-dir <dir>: write KITTI-format prediction files for evaluation.

7. Accuracy Evaluation#

Generate predictions with either main application and then run the evaluation helper.

Split pipeline example:

cd build
./bevfusion <dataset_path> --dump-pred --pred-dir pred_split

Unified pipeline example:

cd build
./bevfusion_unified <dataset_path> --dump-pred --pred-dir pred_unified

Evaluation command:

python3 ../tools/kitti_3d_eval.py \
    --gt <dataset_path>/label_2 \
    --pred ./pred_split \
    --out eval_output \
    --max-distance 102.4 \
    --z-center-gt

8. Performance Testing#

Use a representative dataset and run the main applications without visualization when measuring latency.

./bevfusion <dataset_path> --num-samples N
./bevfusion <dataset_path> --fp32 --num-samples N
./bevfusion_unified <dataset_path> --num-samples N
./bevfusion_unified <dataset_path> --fp16 --num-samples N

Successful runs end with a [perf] summary. For the split pipeline the summary includes lidar, camera, fusion, and total latency. For the unified pipeline the summary includes voxelization, preprocess, inference, postprocess, and total latency. Default runs use INT8 models; add --fp16 for split-model or unified FP16 comparison.

9. Troubleshooting#

Missing libsycl.so.8#

Source oneAPI before running the binaries:

source /opt/intel/oneapi/setvars.sh
source /opt/intel/openvino/setupvars.sh

No samples found in dataset#

  • Verify that the dataset root contains image_2, velodyne, calib, and label_2.

  • Keep sample IDs zero-padded to six digits.

  • Keep the original file extensions when preparing a dataset copy.

Failed to decode image from bin file#

Encoded .bin image files are decoded through OpenCV. If decoding fails, verify that the file contains a valid encoded image.

GPU runtime issues#

  • Re-run bash install_driver_related_libs.sh after a reboot if the script upgraded the kernel.

  • Refresh group membership with newgrp render or log out and back in.

  • Use xpu-smi and clinfo to confirm that the target GPU is visible before running inference.

10. More Detail#

Use the Executable Reference Guide for the full executable reference and sample output patterns. Use the Docker Workflow README on GH for the container workflow.