Perspective-Invariant 3D Object Detection

 1National University of Singapore
 2Shenyang Institute of Automation, Chinese Academy of Sciences
 3Fudan University


Motivation of Perspective invariant 3D object DETection (Pi3DET). We focus on the practical yet challenging task of 3D object detection from heterogeneous robot platforms: Vehicle, Drone, and Quadruped. To achieve strong generalization, we contribute: (1) The first dataset for multi-platform 3D detection, comprising more than 51K LiDAR frames with over 250K meticulously annotated 3D bounding boxes; (2) An adaptation framework, effectively transferring capabilities from vehicles to other platforms by integrating geometric and feature-level representations; (3) A comprehensive benchmark study of state-of-the-art 3D detectors on cross-platform scenarios.



Overview

With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on vehicle-mounted platforms, leaving other autonomous platforms underexplored. To bridge this gap, we introduce Pi3DET, the first benchmark featuring LiDAR data and 3D bounding box annotations collected from multiple platforms: vehicle, quadruped, and drone, thereby facilitating research in 3D object detection for non-vehicle platforms as well as cross-platform 3D detection. Based on Pi3DET, we propose a novel cross-platform adaptation framework that transfers knowledge from the well-studied vehicle platform to other platforms. This framework achieves perspective-invariant 3D detection through robust alignment at both geometric and feature levels. Additionally, we establish a benchmark to evaluate the resilience and robustness of current 3D detectors in cross-platform scenarios, providing valuable insights for developing adaptive 3D perception systems. Extensive experiments validate the effectiveness of our approach on challenging cross-platform tasks, demonstrating substantial gains over existing adaptation methods. We hope this work paves the way for generalizable and unified 3D perception system across diverse and complex environments. Our Pi3DET dataset, cross-platform benchmark suite, and annotation toolkit have been made publicly available.


Dataset

Summary of LiDAR-based 3D object detection datasets.

We compare key aspects from 1robot platforms, 2scale, 3sensor setups, 4temporal (Temp.), 5multi-conditions, etc. To our knowledge, Pi3DET stands out as the first work to feature multi-platform 3D detection from Vehicle Vehicle, Drone Drone, and Quadruped Quadruped, with fine-grained 3D bounding box annotations, conditions, and practical use cases.

Dataset Venue Platform # of
Frames
LiDAR
Setup
Temp. Freq.
(Hz)
Condition Other Sensors Supported
Vehicle Drone Quad Daytime Nighttime
KITTI CVPR'12 14,999 1x64 No - Camera IMU Stereo
ApolloScape TPAMI'18 143,906 1x64 Yes 2 Camera IMU Radar
Waymo Open CVPR'19 198,000 1x64+4x16 Yes 10 Camera IMU Radar
nuScenes CVPR'20 35,149 1x32 Yes 2 Camera IMU Radar
ONCE arXiv'21 ~1 M 1x40 No 2 Camera IMU
Argoverse 2 NeurIPS'21 ~6 M 2x32 Yes 10 Camera IMU
aiMotive ICLRW'23 26,583 1x64 Yes 10 Camera IMU
Zenseact Open ICCV'23 ~100 K 1x128+4x16 Yes 1 Camera IMU
MAN TruckScenes NeurIPS'24 ~30 K 6x64 Yes 2 Camera IMU Radar
AeroCollab3D TGRS'24 3,200 N/A No - Camera IMU
Pi3DET Ours 51,545 1x64 Yes 10 Camera IMU Stereo



Statistical Analysis


Figure: Analysis of perspective differences across three robot platforms. We present the statistics of point elevation distribution (upper-left), ego motion distribution (bottom-left), and target bounding box distribution (right), along with means and variances for each platform's data. We use different colors to denote different platforms for simplicity, i.e., Vehicle, Drone, and Quadruped. Best viewed in colors.




Dataset Examples






Methodology


Framework Overview. The proposed Pi3DET-Net consists of two main stages: Pre-Adaption (PA) and Knowledge-Adaption (KA), aiming at bridging the gap across heterogeneous robot platforms through alignment at both geometric (see Section: geometry alignment) and feature levels (see Section: feature alignment). On the geometric side, PA employs Random Platform Jitter to enhance robustness against ego-motion variations, while KA uses Virtual Platform Pose to simulate source-like viewpoints to achieve bidirectional geometric alignment across platforms. On the feature side, Pi3DET-Net further incorporates KL Probabilistic Feature Alignment to align target features with the source space, along with a Geometry-Aware Transformation Descriptor to correct global transformations across platforms.


Experiments

Comparisons of 3D detection methods for vehicle→drone/quadruped tasks

We report the average precision (AP) in “BEV / 3D” at IoU thresholds of 0.7 and 0.5. Symbol ‡ denotes algorithms w.o. ROS. All scores are given in percentage (%). “–” denotes code not available. The Best and Second Best scores under each metric are highlighted in red and blue, respectively.

# Method Vehicle Vehicle → Quadruped Quadruped Vehicle Vehicle → Drone Drone Average
PV-RCNN Voxel RCNN PV-RCNN Voxel RCNN
AP@0.7AP@0.5 AP@0.7AP@0.5 AP@0.7AP@0.5 AP@0.7AP@0.5 AP@0.7AP@0.5
nuScenes Source Platform 43.40 / 33.55 44.86 / 42.84 43.25 / 33.74 45.62 / 43.32 50.91 / 35.26 57.73 / 50.24 50.15 / 29.41 57.10 / 49.10 46.93 / 32.99 51.33 / 46.34
ST3D 55.40 / 42.0259.59 / 54.75 44.54 / 35.9645.81 / 44.38 65.05 / 40.0168.93 / 64.09 54.62 / 33.7958.45 / 52.89 54.90 / 37.9558.20 / 54.03
ST3D‡ 55.68 / 44.50 59.32 / 55.32 45.01 / 37.1346.73 / 45.45 65.40 / 43.6369.24 / 64.88 55.23 / 36.5159.30 / 54.23 55.33 / 40.4458.65 / 54.97
ST3D++ 55.76 / 43.5159.93 / 55.28 45.56 / 36.9747.28 / 45.84 60.91 / 40.0968.96 / 59.96 57.02 / 37.5261.30 / 55.43 54.81 / 39.5259.37 / 54.13
ST3D++‡ 54.96 / 40.8160.47 / 54.65 45.69 / 36.7648.30 / 46.05 65.50 / 43.4668.99 / 64.62 55.92 / 39.4659.93 / 55.19 55.52 / 40.1259.42 / 55.13
REDB 52.43 / 41.3457.12 / 54.18 – / –– / – 65.31 / 39.1968.74 / 64.13 – / –– / – – / –– / –
MS3D++ 56.24 / 43.20 60.88 / 56.13 51.50 / 40.14 56.03 / 53.86 66.99 / 43.76 69.87 / 65.85 62.68 / 38.26 68.34 / 61.09 59.35 / 41.34 63.78 / 59.23
Pi3DET‑Net 56.80 / 46.36 61.54 / 57.20 54.85 / 42.38 57.41 / 55.54 65.43 / 45.94 69.24 / 65.87 65.63 / 44.62 72.05 / 63.83 60.68 / 44.83 65.06 / 60.61
Target Platform 54.15 / 40.2458.63 / 54.96 54.90 / 39.7456.46 / 55.19 67.67 / 46.1170.04 / 66.14 68.52 / 46.5370.67 / 61.42 61.31 / 43.1663.95 / 59.43
Pi3DET (Vehicle) Source Platform 38.61 / 26.84 40.64 / 39.22 43.95 / 31.24 48.22 / 44.17 57.29 / 36.62 58.92 / 56.19 52.85 / 37.96 61.10 / 52.47 48.17 / 33.1652.22 / 48.01
ST3D 49.29 / 38.6951.02 / 49.71 47.70 / 37.9148.07 / 47.59 60.17 / 33.0162.84 / 54.51 53.79 / 40.1865.29 / 53.40 52.74 / 37.4556.81 / 51.30
ST3D‡ 47.89 / 38.0749.50 / 48.23 47.01 / 41.8554.01 / 53.46 60.67 / 33.2762.98 / 54.61 53.85 / 40.0262.70 / 53.08 52.35 / 38.3057.30 / 52.34
ST3D++ 46.05 / 37.2249.33 / 47.84 48.52 / 37.8455.82 / 48.53 60.04 / 33.9862.71 / 54.13 53.71 / 39.9462.43 / 53.20 52.08 / 37.2457.57 / 50.92
ST3D++‡ 45.14 / 35.7046.94 / 45.37 47.52 / 37.1354.37 / 47.63 64.15 / 34.2063.81 / 55.44 53.64 / 40.2762.43 / 53.10 52.61 / 36.8356.89 / 50.38
REDB 46.74 / 38.4750.29 / 49.54 – / –– / – 61.57 / 34.0563.22 / 54.07 – / –– / – – / –– / –
MS3D++ 53.66 / 40.66 55.21 / 53.78 53.65 / 41.93 54.69 / 54.00 66.05 / 41.17 67.80 / 63.26 53.85 / 40.91 62.87 / 53.44 56.80 / 41.17 60.14 / 56.12
Pi3DET‑Net 56.19 / 44.28 60.35 / 56.20 55.54 / 45.18 59.48 / 58.90 66.26 / 44.47 68.25 / 63.36 67.87 / 46.83 69.95 / 66.26 61.47 / 45.19 64.51 / 61.18
Target Platform 54.15 / 40.2458.63 / 54.96 54.90 / 39.7456.46 / 55.19 67.67 / 46.1170.04 / 66.14 68.52 / 46.5370.67 / 61.42 61.31 / 43.1663.95 / 59.43
Combined All 58.21 / 46.2762.18 / 59.67 60.96 / 48.1563.04 / 61.04 68.44 / 48.1971.11 / 68.24 68.90 / 48.8872.55 / 69.18 64.13 / 47.8767.22 / 64.53


Citation

@inproceedings{liang2025pi3det,
  title     = {Perspective-Invariant 3D Object Detection},
  author    = {Ao Liang and Lingdong Kong and Dongyue Lu and Youquan Liu and Jian Fang and Huaici Zhao and Wei Tsang Ooi},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2025},
}