TensorRT plugin-contract PointPillars pipeline for nuScenes, built on OpenPCDet.
This repository trains PointPillars in OpenPCDet and exports a minimal end-to-end ONNX aligned with the deployable layout (VoxelGeneratorPlugin + PillarScatterPlugin + DecodeBbox3DPlugin) for TensorRT/DriveWorks-style deployment.
points[B,N,4] + num_points[B]
-> VoxelGeneratorPlugin
-> PFN-equivalent thin path (Reshape -> MatMul 10x64 -> BN -> ReLU -> MaxPool)
-> PillarScatterPlugin
-> BaseBEVBackbone + AnchorHeadSingle
-> DecodeBbox3DPlugin
-> output_boxes[B, M, 9], num_boxes[B] (or [B,1] for DriveWorks blob layout)
The key design goal is to keep the exported graph small and deployable, rather than tracing a large PyTorch VFE + ScatterND subgraph.
- Stage 1: OpenPCDet PointPillars training with plugin-contract-friendly VFE (
PillarVFEPluginContract). - Stage 1: Minimal TRT-style ONNX export and composition scripts.
- Stage 1: Post-export ONNX shape fixing for static batch and DriveWorks blob compatibility.
halo_pointpillar/
├── OpenPCDet/
│ ├── pcdet/models/backbones_3d/vfe/pillar_vfe_plugin_contract.py
│ └── tools/
│ ├── export_pointpillars_minimal_trt_onnx.py
│ ├── export_pointpillars_plugin_contract_onnx.py
│ ├── compose_pointpillars_e2e_plugin_onnx.py
│ ├── compose_pointpillars_e2e_three_plugins_onnx.py
│ └── fix_onnx_static_batch_shapes.py
├── train_pointpillars_plugin_contract.py
├── train_pointpillars_deployable_style.py
├── train_pointpillars_nuscenes.py
├── plugin_contract/ # generated artifacts (ONNX, checkpoints, logs)
└── pointpillarnet_deployable_v1.1/ # legacy reference assets
- Input
points: padded point buffer, typicallyfloat32 [1, 100000, 4]. - Input
num_points: valid point count per batch item, usuallyint32 [1]. - Output
output_boxes: decoded boxes,float32 [1, num_boxes_max, 9]. - Output
num_boxes: valid output count, usuallyint32 [1].
For DriveWorks dwDNN_initializeTensorRTFromFile, scalar-like bindings may need rank >= 2; use the provided ONNX fixer with --driveworks-blob-layout to expose num_points and num_boxes as [B,1] at graph boundaries.
cd ~/halo_pointpillar
python3 -m venv venv
source venv/bin/activate
pip install -r OpenPCDet/requirements.txt
pip install onnx onnxsimBuild OpenPCDet ops as required by your setup (spconv/CUDA toolchain).
python3 train_pointpillars_plugin_contract.py \
--pcdet-root ~/halo_pointpillar/OpenPCDet \
--nuscenes-root /path/to/nuscenes \
--nuscenes-version v1.0-trainval \
--work-dir ~/halo_pointpillar/runs/pp_plugin_contract \
--epochs 20 \
--batch-size 1 \
--max-points-per-sample 100000This script generates train/data-prep configs, links nuScenes under OpenPCDet/data/nuscenes, prepares infos/GT database (unless skipped), and launches OpenPCDet training.
cd ~/halo_pointpillar/OpenPCDet/tools
python3 export_pointpillars_minimal_trt_onnx.py \
--cfg_file /path/to/pointpillars_plugin_contract.yaml \
--ckpt /path/to/checkpoint_epoch_20.pth \
--output /path/to/pointpillars_minimal_epoch20.onnx \
--max-points 100000 \
--max-voxels 10000 \
--max-num-points-per-voxel 32python3 fix_onnx_static_batch_shapes.py \
--in /path/to/pointpillars_minimal_epoch20.onnx \
--out /path/to/pointpillars_minimal_epoch20_static.onnx \
--batch 1 \
--max-points 100000python3 fix_onnx_static_batch_shapes.py \
--in /path/to/pointpillars_minimal_epoch20.onnx \
--out /path/to/pointpillars_minimal_epoch20_dw.onnx \
--batch 1 \
--max-points 100000 \
--driveworks-blob-layout- Dynamic batch (
-1) on ONNX inputs can fail in builders that do not create optimization profiles. Use static batch fix above. - ONNX checker does not understand TensorRT custom plugin ops (
VoxelGeneratorPlugin, etc.). This is expected. - DriveWorks may reject rank-1 blob bindings (
[B]) with error:blob needs to have at least 2 dimensions.- Use
--driveworks-blob-layoutto exposenum_points/num_boxesas[B,1]while preserving plugin-internal expectations.
- Use
- Align runtime preprocessor limits (max points) with model limits to avoid host-side buffer mismatches.
This project intentionally mirrors the legacy pointpillars_deployable.onnx strategy:
- Keep plugin boundaries explicit (
VoxelGeneratorPlugin,PillarScatterPlugin,DecodeBbox3DPlugin). - Keep front-end graph minimal (thin PFN-equivalent path) instead of tracing full VFE internals.
- Export a compact ONNX that is easier to inspect in Netron and easier to deploy in TensorRT-centered pipelines.