Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting

CVPR 2023

Tarasha Khurana^* Peiyun Hu^* David Held Deva Ramanan

The Robotics Institute, Carnegie Mellon University

Historical LiDAR Sweeps

Future 4D Occupancy

Rendered Point Clouds

t = {-T, ..., 0}

t = {1, ..., T}

We focus on the problem of scene perception and forecasting for autonomous systems. As traditional methods rely on costly human annotations, we look towards emerging self-supervisable and scalable tasks such as point cloud forecasting. However, we argue that the formulation of point cloud forecasting unnecessarily focuses on learning the sensor extrinsics and intrinsics as part of predicting future point clouds, whereas the only physical quantity of central importance to autonomous perception is future spacetime 4D occupancy. We recast the task as that of 4D occupancy forecasting and show how using the same data as point cloud forecasting, one can learn a meaningful and generic intermediate quantity -- future spacetime 4D occupancy.

PDF / BibTeX

Abstract

Predicting how the world can evolve in the future is crucial for motion planning in autonomous systems. Classical methods are limited because they rely on costly human annotations in the form of semantic class labels, bounding boxes, and tracks or HD maps of cities to plan their motion — and thus are difficult to scale to large unlabeled datasets. One promising self-supervised task is 3D point cloud forecasting from unannotated LiDAR sequences. We show that this task requires algorithms to implicitly capture (1) sensor extrinsics (i.e., the egomotion of the autonomous vehicle), (2) sensor intrinsics (i.e., the sampling pattern specific to the particular lidar sensor), and (3) the shape and motion of other objects in the scene. But autonomous systems should make predictions about the world and not their sensors! To this end, we factor out (1) and (2) by recasting the task as one of spacetime (4D) occupancy forecasting. But because it is expensive to obtain ground-truth 4D occupancy, we ``render'' point cloud data from 4D occupancy predictions given sensor extrinsics and intrinsics, allowing one to train and test occupancy algorithms with unannotated lidar sequences. This allows one to evaluate and compare pointcloud forecasting algorithms across diverse datasets, sensors, and vehicles.

Groundtruth

SPFNet-U

S2Net-U

Raytracing

Ours (Point clouds)

Ours (Occupancy)

Comparison to state-of-the-art on point cloud forecasting

Future Occupancy

nuScenes LiDAR

KITTI LiDAR

ArgoVerse2.0 LiDAR

Applications enabled by disentangled occupancy: LiDAR sensor simulation

Reference RGB frame at t = 0s

Novel-view Depth Synthesis

Applications enabled by disentangled occupancy: novel view synthesis

Code

Our code has been released at https://github.com/tarashakhurana/4d-occ-forecasting.

Supplementary Material

A video version of all figures in the paper is available here. You can also check out our 5-min summary on YouTube and a PDF with some supplementary results.