3D-aware Conditional Image Synthesis

Kangle Deng    Gengshan Yang    Deva Ramanan    Jun-Yan Zhu

Carnegie Mellon University   

In CVPR 2023

Paper | GitHub

Abstract

We propose a 3D-aware conditional generative model for controllable photorealistic image synthesis. Given a 2D label map, such as a segmentation or edge map, our model synthesizes a photo from different viewpoints. Existing approaches fail to either synthesize images based on a conditional input or suffer from noticeable viewpoint inconsistency. Moreover, many of them lack explicit user control of 3D geometry. To tackle the aforementioned challenges, we integrate 3D representations with conditional generative modeling, i.e., enabling controllable high-resolution 3D-aware rendering by conditioning on user inputs. Our model learns to assign a semantic label to every 3D point in addition to color and density, which enables us to render the image and pixel-aligned label map simultaneously. By interactive editing of label maps projected onto user-specified viewpoints, our system can be used as a tool for 3D editing of generated content. Finally, we show that such 3D representations can be learned from widely-available monocular images and label map pairs.


Summary Video ( MP4 link )




Seg2Face Visual Results ( More results )

Input Segmentation Map

Generated Images

Generated Segmentation Map

Semantic Mesh



Seg2Cat Visual Results ( More results )

Input Segmentation Map

Generated Images

Generated Segmentation Map

Semantic Mesh



Edge2Cat Visual Results ( More results )

Input Edge Map

Generated Images

Generated Edge Map

Semantic Mesh



Edge2Car Visual Results ( More results )

Input Edge Map

Generated Images

Generated Edge Map

Mesh



paper thumbnail

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{kangle2023pix2pix3d,
  title={3D-aware Conditional Image Synthesis},
  author={Deng, Kangle and Yang, Gengshan and Ramanan, Deva and Zhu, Jun-Yan},
  booktitle = {CVPR},
  year = {2023}
}




Acknowledgement

We thank Sheng-Yu Wang, Nupur Kumari, Gaurav Parmer, Ruihan Gao, Muyang Li, George Cazenavette, Andrew Song, Zhipeng Bao, Tamaki Kojima, Krishna Wadhwani, Takuya Narihira, and Tatsuo Fujiwara for their discussion and help. We are grateful for the support from Sony Corporation, Singapore DSTA, and the CMU Argo AI Center for Autonomous Vehicle Research.