The Robotics Institute, Carnegie Mellon University
Historical LiDAR Sweeps
Future 4D Occupancy
Rendered Point Clouds
t = {-T, ..., 0}
t = {1, ..., T}
t = {1, ..., T}
We focus on the problem of scene perception and forecasting for autonomous systems. As traditional methods rely on costly human annotations, we look towards emerging self-supervisable and scalable tasks such as point cloud forecasting. However, we argue that the formulation of point cloud forecasting unnecessarily focuses on learning the sensor extrinsics and intrinsics as part of predicting future point clouds, whereas the only physical quantity of central importance to autonomous perception is future spacetime 4D occupancy. We recast the task as that of 4D occupancy forecasting and show how using the same data as point cloud forecasting, one can learn a meaningful and generic intermediate quantity -- future spacetime 4D occupancy.
Predicting how the world can evolve in the future is crucial for motion planning in autonomous systems. Classical methods are limited because they rely on costly human annotations in the form of semantic class labels, bounding boxes, and tracks or HD maps of cities to plan their motion — and thus are difficult to scale to large unlabeled datasets. One promising self-supervised task is 3D point cloud forecasting from unannotated LiDAR sequences. We show that this task requires algorithms to implicitly capture (1) sensor extrinsics (i.e., the egomotion of the autonomous vehicle), (2) sensor intrinsics (i.e., the sampling pattern specific to the particular lidar sensor), and (3) the shape and motion of other objects in the scene. But autonomous systems should make predictions about the world and not their sensors! To this end, we factor out (1) and (2) by recasting the task as one of spacetime (4D) occupancy forecasting. But because it is expensive to obtain ground-truth 4D occupancy, we ``render'' point cloud data from 4D occupancy predictions given sensor extrinsics and intrinsics, allowing one to train and test occupancy algorithms with unannotated lidar sequences. This allows one to evaluate and compare pointcloud forecasting algorithms across diverse datasets, sensors, and vehicles.
Groundtruth
SPFNet-U
S2Net-U
Raytracing
Ours (Point clouds)
Ours (Occupancy)
Comparison to state-of-the-art on point cloud forecasting
Future Occupancy
nuScenes LiDAR
KITTI LiDAR
ArgoVerse2.0 LiDAR
Applications enabled by disentangled occupancy: LiDAR sensor simulation
Reference RGB frame at t = 0s
Novel-view Depth Synthesis
Applications enabled by disentangled occupancy: novel view synthesis
A video version of all figures in the paper is available here. You can also check out our 5-min summary on YouTube and a PDF with some supplementary results.
Webpage template stolen from the amazing Peiyun Hu.