Multi-Range Pyramids for 3D Object Detection

Mengtian (Martin) Li

Carnegie Mellon University

Benjamin Wilson

Georgia Institute of Technology
Argo AI

Yu-Xiong Wang

University of Illinois at
Urbana-Champaign

James Hays

Georgia Institute of Technology
Argo AI

Deva Ramanan

Carnegie Mellon University
Argo AI


Abstract

LiDAR-based 3D detection plays a vital role in autonomous navigation. Contemporary solutions make use of 3D voxel representations, often encoded with a bird's-eye view (BEV) feature map. While quite intuitive, such representations scale quadratically with the spatial range of the map, making them ill-suited for far-field perception. In this paper, we present a multi-range representation that retains the benefits of BEV while remaining efficient by exploiting the following insight: near-field LiDAR measurements are dense and optimally encoded by small voxels, while far-field measurements are sparse and better encoded with large voxels. We exploit this observation to build a collection of range experts tuned for near-vs-far field detection, and show that they can share information with each other via a single multi-range feature pyramid. We show how standard convolutions need to be adjusted for this novel representation and provide local and global across-range feature sharing mechanisms to work around this problem. We evaluate our method on the long-range detection dataset Argoverse (up to ±200m), and find that our method achieves significantly higher accuracy than competitive baselines while being faster in terms of wall-clock runtime.