Perspective-Invariant 3D Object Detection

Ao Liang^1,2,* Lingdong Kong^1,* Dongyue Lu^1,* Youquan Liu³ Jian Fang² Huaici Zhao² Wei Tsang Ooi¹

¹National University of Singapore

²Shenyang Institute of Automation, Chinese Academy of Sciences

³Fudan University

Paper GitHub Dataset

Motivation of Perspective invariant 3D object DETection (Pi3DET). We focus on the practical yet challenging task of 3D object detection from heterogeneous robot platforms: Vehicle, Drone, and Quadruped. To achieve strong generalization, we contribute: (1) The first dataset for multi-platform 3D detection, comprising more than 51K LiDAR frames with over 250K meticulously annotated 3D bounding boxes; (2) An adaptation framework, effectively transferring capabilities from vehicles to other platforms by integrating geometric and feature-level representations; (3) A comprehensive benchmark study of state-of-the-art 3D detectors on cross-platform scenarios.

Overview

With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on vehicle-mounted platforms, leaving other autonomous platforms underexplored. To bridge this gap, we introduce Pi3DET, the first benchmark featuring LiDAR data and 3D bounding box annotations collected from multiple platforms: vehicle, quadruped, and drone, thereby facilitating research in 3D object detection for non-vehicle platforms as well as cross-platform 3D detection. Based on Pi3DET, we propose a novel cross-platform adaptation framework that transfers knowledge from the well-studied vehicle platform to other platforms. This framework achieves perspective-invariant 3D detection through robust alignment at both geometric and feature levels. Additionally, we establish a benchmark to evaluate the resilience and robustness of current 3D detectors in cross-platform scenarios, providing valuable insights for developing adaptive 3D perception systems. Extensive experiments validate the effectiveness of our approach on challenging cross-platform tasks, demonstrating substantial gains over existing adaptation methods. We hope this work paves the way for generalizable and unified 3D perception system across diverse and complex environments. Our Pi3DET dataset, cross-platform benchmark suite, and annotation toolkit have been made publicly available.

Dataset

Summary of LiDAR-based 3D object detection datasets.

We compare key aspects from ¹robot platforms, ²scale, ³sensor setups, ⁴temporal (Temp.), ⁵multi-conditions, etc. To our knowledge, Pi3DET stands out as the first work to feature multi-platform 3D detection from Vehicle Vehicle, Drone Drone, and Quadruped, with fine-grained 3D bounding box annotations, conditions, and practical use cases.

Dataset	Venue	Platform			# of Frames	LiDAR Setup	Temp.	Freq. (Hz)	Condition		Other Sensors Supported
Dataset	Venue	Vehicle	Drone	Quad	# of Frames	LiDAR Setup	Temp.	Freq. (Hz)			Other Sensors Supported
KITTI	CVPR'12	✓	✗	✗	14,999	1x64	No	-	✓	✗
ApolloScape	TPAMI'18	✓	✗	✗	143,906	1x64	Yes	2	✓	✓
Waymo Open	CVPR'19	✓	✗	✗	198,000	1x64+4x16	Yes	10	✓	✓
nuScenes	CVPR'20	✓	✗	✗	35,149	1x32	Yes	2	✓	✓
ONCE	arXiv'21	✓	✗	✗	~1 M	1x40	No	2	✓	✓
Argoverse 2	NeurIPS'21	✓	✗	✗	~6 M	2x32	Yes	10	✓	✗
aiMotive	ICLRW'23	✓	✗	✗	26,583	1x64	Yes	10	✓	✓
Zenseact Open	ICCV'23	✓	✗	✗	~100 K	1x128+4x16	Yes	1	✓	✓
MAN TruckScenes	NeurIPS'24	✓	✗	✗	~30 K	6x64	Yes	2	✓	✓
AeroCollab3D	TGRS'24	✗	✓	✗	3,200	N/A	No	-	✓	✗
Pi3DET	Ours	✓	✓	✓	51,545	1x64	Yes	10	✓	✓

Statistical Analysis

Figure: Analysis of perspective differences across three robot platforms. We present the statistics of point elevation distribution (upper-left), ego motion distribution (bottom-left), and target bounding box distribution (right), along with means and variances for each platform's data. We use different colors to denote different platforms for simplicity, i.e., Vehicle, Drone, and Quadruped. Best viewed in colors.

Dataset Examples

Methodology

Framework Overview. The proposed Pi3DET-Net consists of two main stages: Pre-Adaption (PA) and Knowledge-Adaption (KA), aiming at bridging the gap across heterogeneous robot platforms through alignment at both geometric (see Section: geometry alignment) and feature levels (see Section: feature alignment). On the geometric side, PA employs Random Platform Jitter to enhance robustness against ego-motion variations, while KA uses Virtual Platform Pose to simulate source-like viewpoints to achieve bidirectional geometric alignment across platforms. On the feature side, Pi3DET-Net further incorporates KL Probabilistic Feature Alignment to align target features with the source space, along with a Geometry-Aware Transformation Descriptor to correct global transformations across platforms.

Experiments

Comparisons of 3D detection methods for vehicle→drone/quadruped tasks

We report the average precision (AP) in “BEV / 3D” at IoU thresholds of 0.7 and 0.5. Symbol ‡ denotes algorithms w.o. ROS. All scores are given in percentage (%). “–” denotes code not available. The Best and Second Best scores under each metric are highlighted in red and blue, respectively.

#	Method	Vehicle → Quadruped				Vehicle → Drone				Average
		PV-RCNN		Voxel RCNN		PV-RCNN		Voxel RCNN		Average
		AP@0.7	AP@0.5	AP@0.7	AP@0.5	AP@0.7	AP@0.5	AP@0.7	AP@0.5	AP@0.7	AP@0.5
nuScenes	Source Platform	43.40 / 33.55	44.86 / 42.84	43.25 / 33.74	45.62 / 43.32	50.91 / 35.26	57.73 / 50.24	50.15 / 29.41	57.10 / 49.10	46.93 / 32.99	51.33 / 46.34
	ST3D	55.40 / 42.02	59.59 / 54.75	44.54 / 35.96	45.81 / 44.38	65.05 / 40.01	68.93 / 64.09	54.62 / 33.79	58.45 / 52.89	54.90 / 37.95	58.20 / 54.03
	ST3D‡	55.68 / 44.50	59.32 / 55.32	45.01 / 37.13	46.73 / 45.45	65.40 / 43.63	69.24 / 64.88	55.23 / 36.51	59.30 / 54.23	55.33 / 40.44	58.65 / 54.97
	ST3D++	55.76 / 43.51	59.93 / 55.28	45.56 / 36.97	47.28 / 45.84	60.91 / 40.09	68.96 / 59.96	57.02 / 37.52	61.30 / 55.43	54.81 / 39.52	59.37 / 54.13
	ST3D++‡	54.96 / 40.81	60.47 / 54.65	45.69 / 36.76	48.30 / 46.05	65.50 / 43.46	68.99 / 64.62	55.92 / 39.46	59.93 / 55.19	55.52 / 40.12	59.42 / 55.13
	REDB	52.43 / 41.34	57.12 / 54.18	– / –	– / –	65.31 / 39.19	68.74 / 64.13	– / –	– / –	– / –	– / –
	MS3D++	56.24 / 43.20	60.88 / 56.13	51.50 / 40.14	56.03 / 53.86	66.99 / 43.76	69.87 / 65.85	62.68 / 38.26	68.34 / 61.09	59.35 / 41.34	63.78 / 59.23
	Pi3DET‑Net	56.80 / 46.36	61.54 / 57.20	54.85 / 42.38	57.41 / 55.54	65.43 / 45.94	69.24 / 65.87	65.63 / 44.62	72.05 / 63.83	60.68 / 44.83	65.06 / 60.61
	Target Platform	54.15 / 40.24	58.63 / 54.96	54.90 / 39.74	56.46 / 55.19	67.67 / 46.11	70.04 / 66.14	68.52 / 46.53	70.67 / 61.42	61.31 / 43.16	63.95 / 59.43

Pi3DET (Vehicle)	Source Platform	38.61 / 26.84	40.64 / 39.22	43.95 / 31.24	48.22 / 44.17	57.29 / 36.62	58.92 / 56.19	52.85 / 37.96	61.10 / 52.47	48.17 / 33.16	52.22 / 48.01
	ST3D	49.29 / 38.69	51.02 / 49.71	47.70 / 37.91	48.07 / 47.59	60.17 / 33.01	62.84 / 54.51	53.79 / 40.18	65.29 / 53.40	52.74 / 37.45	56.81 / 51.30
	ST3D‡	47.89 / 38.07	49.50 / 48.23	47.01 / 41.85	54.01 / 53.46	60.67 / 33.27	62.98 / 54.61	53.85 / 40.02	62.70 / 53.08	52.35 / 38.30	57.30 / 52.34
	ST3D++	46.05 / 37.22	49.33 / 47.84	48.52 / 37.84	55.82 / 48.53	60.04 / 33.98	62.71 / 54.13	53.71 / 39.94	62.43 / 53.20	52.08 / 37.24	57.57 / 50.92
	ST3D++‡	45.14 / 35.70	46.94 / 45.37	47.52 / 37.13	54.37 / 47.63	64.15 / 34.20	63.81 / 55.44	53.64 / 40.27	62.43 / 53.10	52.61 / 36.83	56.89 / 50.38
	REDB	46.74 / 38.47	50.29 / 49.54	– / –	– / –	61.57 / 34.05	63.22 / 54.07	– / –	– / –	– / –	– / –
	MS3D++	53.66 / 40.66	55.21 / 53.78	53.65 / 41.93	54.69 / 54.00	66.05 / 41.17	67.80 / 63.26	53.85 / 40.91	62.87 / 53.44	56.80 / 41.17	60.14 / 56.12
	Pi3DET‑Net	56.19 / 44.28	60.35 / 56.20	55.54 / 45.18	59.48 / 58.90	66.26 / 44.47	68.25 / 63.36	67.87 / 46.83	69.95 / 66.26	61.47 / 45.19	64.51 / 61.18
	Target Platform	54.15 / 40.24	58.63 / 54.96	54.90 / 39.74	56.46 / 55.19	67.67 / 46.11	70.04 / 66.14	68.52 / 46.53	70.67 / 61.42	61.31 / 43.16	63.95 / 59.43
	Combined All	58.21 / 46.27	62.18 / 59.67	60.96 / 48.15	63.04 / 61.04	68.44 / 48.19	71.11 / 68.24	68.90 / 48.88	72.55 / 69.18	64.13 / 47.87	67.22 / 64.53

Citation

@inproceedings{liang2025pi3det,
  title     = {Perspective-Invariant 3D Object Detection},
  author    = {Ao Liang and Lingdong Kong and Dongyue Lu and Youquan Liu and Jian Fang and Huaici Zhao and Wei Tsang Ooi},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2025},
}