# Running NVIDIA's V2X-I PointPillars Dense FP16 Model on Intel GPU **Purpose.** This guide describes how to take a model trained with NVIDIA's [CUDA-V2XFusion](https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/tree/master/CUDA-V2XFusion) reference design and deploy it on Intel GPU via the [intermediate-fusion](https://github.com/open-edge-platform/edge-ai-suites/tree/release-2026.1.0/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion) deploy binary. **Audience.** Customers who already hold a CUDA-V2XFusion-trained checkpoint — either NVIDIA's provided reference model `dense_epoch_100_.pth`, or a checkpoint you produced yourself by following NVIDIA's reference training flow — and want to run inference on Intel platform without any retraining or C++ changes on the deploy side. **Scope.** Weight conversion only: take a CUDA-V2XFusion `.pth`, produce the 4-ONNX + INT8 OpenVINO IR artifacts the Intel deploy binary expects, install them, and run the binary end-to-end. No retraining, no mmdet3d edits, no config edits. Pipeline A (split 4-ONNX) only. --- ## 1. Overview ``` dense_epoch_100_.pth (NVIDIA reference model) │ ▼ [FP32 export] ─> export/V2X-I/pp/ export_all.py camera.backbone.onnx (~85 MB) lidar_pfe.onnx (~18 KB, dynamic V) fuser.onnx (~48 MB) head.onnx (~2.4 MB) │ ▼ [Static V=7000 PFE] ─> export/V2X-I/pp/ export-lidar.py lidar_pfe_v7000.onnx (~4.8 MB) │ ▼ [INT8 PTQ (NNCF)] ─> export/V2X-I/pp/ quantize_all.py quantized_camera.{xml,bin} quantized_lidar_pfe.{xml,bin} quantized_fuser.{xml,bin} quantized_head.{xml,bin} │ ▼ [Copy to deploy tree] ─> edge-ai-suites/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/ deploy/data/v2xfusion/pointpillars/ │ ▼ [Run on Intel GPU] cd deploy/build && ./bevfusion --preset v2x --int8 ``` The entire left column (export + quantize) happens inside NVIDIA's bevfusion training repo after you apply the patch bundle this guide ships. The deploy binary already knows how to consume the files produced. --- ## 2. Prerequisites ### 2.1 Set up NVIDIA's bevfusion training repo Follow NVIDIA's own instructions at [Lidar_AI_Solution/CUDA-V2XFusion/README.md](https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/tree/master/CUDA-V2XFusion) to: 1. Clone MIT BEVFusion at the commit NVIDIA pins. 2. Layer the BEVHeight and CUDA-V2XFusion patches on top as described in NVIDIA's README. 3. Install the Python environment: Python 3.8, `torch==1.11`, `mmcv`, `mmdet3d`, `torchpack`, and the usual MIT BEVFusion dependencies. Do not attempt to run training — you only need the Python environment and the configs. ### 2.2 Download NVIDIA's reference checkpoint Grab `dense_epoch_100_.pth` per NVIDIA's CUDA-V2XFusion README. Note its absolute path — you will pass it to the export and quantize commands. ### 2.3 Add ONNX export + INT8 PTQ dependencies In the same Python environment you set up in §2.1, install the extras used by the scripts shipped with this guide: ```bash pip install "nncf>=2.13" "openvino>=2024.4" "onnx" "onnxsim" ``` That is everything. The export and quantize scripts reuse `mmdet3d`, `mmcv`, and `torchpack` that are already present from §2.1. ### 2.4 Clone the deploy repo and build it Clone [edge-ai-suites](https://github.com/open-edge-platform/edge-ai-suites.git) and follow its own documentation for the build: - `deploy/README.md` — top-level build instructions. - `deploy/docs/Prerequisites.md` — oneAPI + custom OpenVINO installation. - `deploy/docs/GSG.md` — full getting-started guide with build and run commands. **We deliberately do not duplicate those instructions here.** Once you have a working `deploy/build/bevfusion` binary and its default dataset directory, come back to this guide. --- ## 3. Step 1 — Apply the patch bundle From the root of your NVIDIA bevfusion clone (the directory that contains `tools/`, `mmdet3d/`, `configs/`), run: ```bash cp -r /path/to/this/Guide/nvidia_ckpt_to_intel_gpu_patches/* . ``` That drops 12 files under `export/pointpillars/` (see [Appendix A](#appendix-a--what-the-patch-bundle-adds) for the exact list). No existing file is touched — this is a pure addition. Sanity check: ```bash ls export/pointpillars/ # expected: # __init__.py _calib_data.py # export_all.py export-camera.py export-lidar.py export-fuser.py export-head.py # quantize_all.py quantize_camera_backbone.py quantize_lidar_pfe.py quantize_fuser.py quantize_head.py ``` --- ## 4. Step 2 — FP32 ONNX export (4 sub-graphs) The deploy binary splits the BEVFusion graph into four independently-loaded ONNX sub-graphs. Export all of them at once: ```bash python export/pointpillars/export_all.py \ --config configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml \ --ckpt /path/to/dense_epoch_100_.pth \ --out-dir export/V2X-I/pp/ ``` Expected tail output: ``` [cam-export] reference shapes: feat=(1, 80, 54, 96) depth=(1, 90, 54, 96) [cam-export] saved to export/V2X-I/pp/camera.backbone.onnx [lidar-export] saved to export/V2X-I/pp/lidar_pfe.onnx [fuser-export] saved to export/V2X-I/pp/fuser.onnx [head-export] saved to export/V2X-I/pp/head.onnx SUMMARY [OK] camera export/V2X-I/pp/camera.backbone.onnx (~88 MB) [OK] lidar export/V2X-I/pp/lidar_pfe.onnx (~18 KB) [OK] fuser export/V2X-I/pp/fuser.onnx (~50 MB) [OK] head export/V2X-I/pp/head.onnx (~2.4 MB) ``` ### Benign warnings you will see and can ignore - **`missing keys in source state_dict: encoders.camera.vtransform.cx`** — `cx` is a non-learnable buffer used only for BEV coordinate offset; model `__init__` fills the default value. Not present in NVIDIA's ckpt, harmless for inference. - **`unexpected key in source state_dict: fc.weight, fc.bias`** — these come from the ResNet34 ImageNet-pretrained fc layer that BEVFusion never uses. --- ## 5. Step 3 — Export the static V=7000 PFE The Intel deploy binary's split pipeline hard-codes a maximum of 7000 voxels per frame (see `deploy/src/pipeline/split_pipeline_config.cpp` — `default_int8_pfe_model` and `default_fp32_pfe_model` both pin `max_voxels=7000`). You therefore need a second PFE ONNX with a fixed batch-voxel dimension of 7000 in addition to the dynamic-V version from Step 2: ```bash python export/pointpillars/export-lidar.py \ --config configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml \ --ckpt /path/to/dense_epoch_100_.pth \ -o export/V2X-I/pp/lidar_pfe_v7000.onnx \ --fixed-v 7000 \ --split val ``` Expected tail output: ``` [lidar-export] tracing from cfg.data.val [lidar-export] traced shapes: features=(5137, 100, 4) num_voxels=(5137,) coors=(5137, 4) [lidar-export] wrapper vs pfe max-abs-diff = 0.000004 [lidar-export] exporting with FIXED V=7000 (measured dataset max V=6295, using safety margin) [lidar-export] fixed-V sanity OK (no NaN), output (7000, 64) [lidar-export] saved to export/V2X-I/pp/lidar_pfe_v7000.onnx ``` **Important — do not drop `--split val`.** The `--split` argument tells the tracer to pull a real frame from `cfg.data.val`, which determines the activation distribution the INT8 calibrator will see later. Using a trace frame from a mismatched dataset layout is a silent correctness bug. --- ## 6. Step 4 — INT8 PTQ quantization Calibrate and quantize the four ONNX models to INT8 OpenVINO IR: ```bash python export/pointpillars/quantize_all.py \ --config configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml \ --ckpt /path/to/dense_epoch_100_.pth \ --onnx-dir export/V2X-I/pp/ \ --out-dir export/V2X-I/pp/ \ --num-samples 300 ``` **Run this with the same Python interpreter you used in Steps 2 and 3.** The quantize scripts call into `mmdet3d` and `torchpack` to build real calibration samples, so they need the same mmdet3d-capable environment, not a separate NNCF-dedicated env. Expected tail output: ``` SUMMARY [OK] camera export/V2X-I/pp/quantized_camera.xml (~420 KB) + .bin (~22 MB) [OK] lidar_pfe export/V2X-I/pp/quantized_lidar_pfe.xml (~73 KB) + .bin (~2.5 MB) [OK] fuser export/V2X-I/pp/quantized_fuser.xml (~208 KB) + .bin (~12.5 MB) [OK] head export/V2X-I/pp/quantized_head.xml (~216 KB) + .bin (~614 KB) ``` `quantize_all.py` auto-detects `lidar_pfe_v7000.onnx` in `--onnx-dir` and uses it in preference to the dynamic-V PFE, which is what the deploy binary expects for INT8. --- ## 7. Step 5 — Install artifacts into the deploy tree The deploy binary looks for its model files under `deploy/data/v2xfusion/pointpillars/` by default for `--preset v2x`. Copy both the FP32 fallback ONNXs and the INT8 IRs into that directory: ```bash DEPLOY_DIR=/path/to/edge-ai-suites/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/deploy/data/v2xfusion/pointpillars mkdir -p "$DEPLOY_DIR" # FP32 fallbacks cp export/V2X-I/pp/camera.backbone.onnx "$DEPLOY_DIR/" cp export/V2X-I/pp/lidar_pfe.onnx "$DEPLOY_DIR/" cp export/V2X-I/pp/lidar_pfe_v7000.onnx "$DEPLOY_DIR/" cp export/V2X-I/pp/fuser.onnx "$DEPLOY_DIR/" cp export/V2X-I/pp/head.onnx "$DEPLOY_DIR/" # INT8 IR pairs cp export/V2X-I/pp/quantized_camera.xml "$DEPLOY_DIR/" cp export/V2X-I/pp/quantized_camera.bin "$DEPLOY_DIR/" cp export/V2X-I/pp/quantized_lidar_pfe.xml "$DEPLOY_DIR/" cp export/V2X-I/pp/quantized_lidar_pfe.bin "$DEPLOY_DIR/" cp export/V2X-I/pp/quantized_fuser.xml "$DEPLOY_DIR/" cp export/V2X-I/pp/quantized_fuser.bin "$DEPLOY_DIR/" cp export/V2X-I/pp/quantized_head.xml "$DEPLOY_DIR/" cp export/V2X-I/pp/quantized_head.bin "$DEPLOY_DIR/" ``` If you want to keep multiple model variants side by side, you can put them under any directory and point the deploy binary at it explicitly with `--model-dir` (see Step 6). --- ## 8. Step 6 — Run the deploy binary Source the oneAPI and OpenVINO environments exactly as the deploy repo's own `deploy/README.md` / `deploy/docs/GSG.md` describe, then: ```bash cd /path/to/edge-ai-suites/metro-ai-suite/sensor-fusion-for-traffic-management/intermediate-fusion/deploy/build ./bevfusion /path/to/v2x_dataset --preset v2x --int8 --num-samples 30 --vis --save-video --vis-dir ./viz ``` Key flags: | Flag | Meaning | |---|---| | `--preset v2x` | V2X-I geometry, BEV grid 128×128, pc_range Y ∈ [-51.2, 51.2] | | `--int8` | Use all four `quantized_*.xml` IRs (falls back to FP32 ONNX per stage if a file is missing) | | `--int8-camera` / `--int8-pfe` / `--int8-fuser` / `--int8-head` | Toggle INT8 stage-by-stage | | `--model-dir DIR` | Override the default `data/v2xfusion/pointpillars/` location | | `--num-samples N` | Process the first N frames | | `--dump-pred --pred-dir DIR` | Write KITTI-format per-frame box `.txt` files | | `--vis --save-video --vis-dir DIR` | Write `bevfusion.mp4` and optional per-frame PNGs | Refer to the deploy repo's own `deploy/docs/GSG.md` for the authoritative full flag list and expected performance figures on the target GPU. --- ## 9. Troubleshooting | Symptom | Cause / Fix | |---|---| | Warning `missing keys in source state_dict: encoders.camera.vtransform.cx` | Benign. NVIDIA's ckpt lacks this non-learnable buffer; the model default fills it. No action. | | Warning `unexpected key in source state_dict: fc.weight, fc.bias` | Benign. ResNet34 ImageNet-pretrained fc layer that BEVFusion doesn't use. No action. | | `ModuleNotFoundError: No module named 'torchpack'` during Step 4 | You're running the quantize scripts in a different Python env than Step 2/3. Use the same mmdet3d-capable environment for all three steps. | | `ModuleNotFoundError: No module named 'nncf'` | Step 2.3 was skipped — install `nncf`, `openvino`, `onnx`, `onnxsim` into the env you are using. | | PFE INT8 numerically collapsed (poor detections with `--int8-pfe`) | Re-run Step 3 **with `--split val`**. Using a mismatched trace source produces wrong activation scales and the calibrator bakes them in. | | Deploy binary silently runs FP32 even with `--int8` | One of the `quantized_*.xml` / `.bin` files is missing in the deploy model directory. Re-check Step 5. The deploy binary falls back to FP32 per stage when the INT8 IR is absent. | | FP32 fuser used despite `--int8` on Intel Arc B580 (Battlemage) | Expected behavior. The deploy binary has a known B580-specific INT8 fuser fallback; the other three stages still run INT8. See the deploy repo's own notes. | | `onnx.checker` failure on `camera.backbone.onnx` | The `onnxsim` simplification step during Step 2 may have failed silently if you're on a very old `onnxsim`. Upgrade: `pip install -U onnxsim onnx`. | --- ## Appendix A — What the patch bundle adds ``` / └── export/ └── pointpillars/ ├── __init__.py (empty, makes the folder a package) ├── _calib_data.py (shared PyTorch-side calibration helper) ├── export_all.py (FP32 export orchestrator) ├── export-camera.py (ResNet34 backbone + LSS neck + depthnet → camera.backbone.onnx) ├── export-lidar.py (PillarFeatureNet → lidar_pfe[,_v7000].onnx) ├── export-fuser.py (ConvFuser + decoder → fuser.onnx) ├── export-head.py (CenterHead → head.onnx, 12 output tensors) ├── quantize_all.py (INT8 PTQ orchestrator, auto-picks v7000 PFE) ├── quantize_camera_backbone.py (NNCF PTQ on camera.backbone.onnx) ├── quantize_lidar_pfe.py (NNCF PTQ on lidar_pfe_v7000.onnx) ├── quantize_fuser.py (NNCF PTQ on fuser.onnx) └── quantize_head.py (NNCF PTQ on head.onnx) ``` Nothing under `mmdet3d/`, `configs/`, or `tools/` is touched. The patch is purely additive. ## Appendix B — Why NVIDIA's ckpt works without any code/config changes - **State dict shape** — the NVIDIA checkpoint and a model built from `configs/V2X-I/det/centerhead/lssfpn/camera+pointpillar/resnet34/default.yaml` agree on every weight tensor's shape. The only difference is the 3-element `encoders.camera.vtransform.cx` buffer, which is a non-learnable constant that the model constructor fills with the default. - **Pipeline A does not touch LSS's `get_cam_feats()`** — `export-camera.py` exports `backbone → neck → depthnet` directly and does the per-pixel depth softmax inline, so any downstream `use_bevpool` branching in `mmdet3d/models/vtransforms/lss.py` is irrelevant. - **ResNet34 pretrained URL vs local path** — NVIDIA's config references the remote pretrained URL; the only effect is where the ImageNet init comes from. Those weights are overwritten by the NVIDIA checkpoint anyway, so this mismatch is invisible at inference time. - **`strict=False`** — every export and quantize script loads the checkpoint with `strict=False`, so the `cx` missing-key and the ResNet34 `fc.*` extra-keys warnings are just logs, not errors.