Katerina Fragkiadaki
email katef 'at' cs.cmu.edu

CV | Bio | Google Scholar | Twitter

I am a JPMorgan Chase Associate Professor of Computer Science in the Machine Learning Department at Carnegie Mellon University. I work in Artificial Intelligence at the intersection of Computer Vision, Machine Learning, Language Understanding and Robotics. Prior to joining MLD's faculty I spent three wonderful years as a post doctoral researcher first at UC Berkeley working with Jitendra Malik and then at Google Research in Mountain View working with the video group. I completed my Ph.D. in GRASP, UPenn with Jianbo Shi . I did my undergraduate studies at the National Technical University of Athens and before that I was in Crete. Prospective students: If you want to join CMU as PhD student, just mention my name in your application. Otherwise, if you would like to join our group in any other capacity, please fill this form and then please send me a short email note without any documents.

News

Teaching

Deep Reinforcement Learning and Control Fall 2024
Deep Reinforcement Learning and Control Spring 2024
Deep Reinforcement Learning and Control Fall 2023
Deep Reinforcement Learning and Control Spring 2023

Research Group

Our group studies Artificial Intelligence, and specifically Machine Learning models at the intersection of Computer Vision, Language Understanding and Robotics. Our ultimate goal is to build machines that will autonomously and in interactions with humans and with the environment acquire and continuously improve world models, that would let them reason through consequences of their and others decisions, to surpass humans in both dexterity and creativity. Topics we currently focus on include representation learning, video understanding, 2D/3D unified vision language models, generative modeling, learning simulators from data, real2sim and sim2real robot learning, reinforcement learning, continual learning.
PhD Students
Wen-hsuan Chu
Gabriel Sarch (with Mike Tarr)
Brian Yang (with Jeff Schneider)
Nikos Gkanatsios
Mihir Prabhudesai (with Deepak Pathak)
Ayush Jain
Kashu Yamazaki
Matthew Bronars
Postdoc
Lei Ke

MS Students
Yash Jangir
Alexander Swerlow
Former Students
Tsung-Wei Ke (PostDoc, now professor at NTU CSIE)
Xian Zhou (PhD student)
Fish Tung (PhD student, post doc in M.I.T., Tesla, Google DeepMind)
Adam Harley (PhD student, PostDoc at Stanford, Meta)
Theo Gevret (PhD student, Mistral)
Pushkal Katara (MS student, ScaledFoundations)
Ricson Chen (undergrad, CRA research award)
Zhaoyuan Fang (MS student, Google)
Mayank Singh (MS student, Apple).
Yunchu Zhang (MS student, UW PhD)
Ziyan Wang (MSR student, RI CMU PhD)
Shamit Lal (MS student, Amazon AGI)
Yiming Zuo (MSR student, Princeton PhD )
Max Sieb (MSR student, Copvarian AI, Google DeepMind)
Arpit Agarwal (MSR student, RI CMU PhD)
Henry Huang (MS student, Bloomberg)
Chris Ying (MS student, Google Brain)
Darshan Patil (undergraduate, MILA PhD)
Nilay Pande (MS, Tesla, Waymo)
Ishitta Mediratta (undergraduate collaborator, Meta)

Selected Publications

Video Diffusion Alignment via Reward Gradients
Mihir Prabhudesai, Russell Mendonca, Zheyang Qin, Katerina Fragkiadaki, Deepak Pathak
VADER aligns video diffusion models using end-to-end reward gradient backpropagation from off-the-shelf differentiable reward functions.
arxiv
webpage
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought
Gabriel Sarch, Lawrence Jang, Michael Tarr, William Cohen, Kenneth Marino, Katerina Fragkiadaki
A technique that enables VLM agents to take initially suboptimal demonstrations and iteratively improve them, ultimately generating high-quality trajectory data that includes both optimized actions and detailed reasoning annotations suitable for more effective in-context learning and fine-tuning.
NeurIPS 2024 spotlight
webpage
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Wen-Hsuan Chu*, Lei Ke*, Katerina Fragkiadaki
DreamScene4D generates 3D dynamic scenes of multiple objects from monocular videos training-free, using object-centric diffusion priors and pixel and motion reprojection error.
NeurIPS 2024
webpage
ODIN: A Single Model for 2D and 3D Perception
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki
ODIN processes both RGB images and sequences of posed RGB-D images by alternating between 2D and 3D fusion layers using projection and unprojection from camera info. New SOTA in Scannet200.
CVPR 2024 spotlight
webpage
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
Tsung-Wei Ke*, Nikolaos Gkanatsios*, Katerina Fragkiadaki
Combining 3D relative attention transformers with action trajectory diffusion gives SOTA imitation learning robot policies in CALVIN and RLbench.
CoRL 2024
webpage
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following
Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki
Diffusion-ES combines trajectory diffusion models with evolutionary search and achieves SOTA performance in nuPLAN. We prompt LLMs to map language instructions to shaped reward functions, and optimize them with diffusion-ES, and solve the hardest driving scenarios.
CVPR 2024
webpage
Test-time Adaptation of Discriminative Models via Diffusion Generative Feedback
Mihir Prabhudesai*, Tsung-Wei Ke*, Alexander C. Li, Deepak Pathak, Katerina Fragkiadaki
NeurIPS 2023
webpage
Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki
EMNLP findings 2023
webpage
Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation
Theophile Gervet*, Zhou Xian*, Nikolaos Gkanatsios, Katerina Fragkiadaki
CoRL 2023
webpage
Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models
Pushkal Katara*, Xian Zhou*, Katerina Fragkiadaki
ICRA 2024
webpage
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, Chuang Gan
ICML 2024
webpage
ChainedDiffuser: Unifying Trajectory Diffusion and Keypose Prediction for Robotic Manipulation
Zhou Xian*, Nikolaos Gkanatsios*, Theophile Gervet*, Tsung-Wei Ke, Katerina Fragkiadaki
CoRL 2023
webpage
Test-time Adaptation with Slot-Centric Models
Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki
ICML 2023
webpage
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki
RSS 2023
webpage
Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?
Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki
ICRA 2023
webpage
FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation
Zhou Xian, Bo Zhu, Zhenjia Xu, Hsiao-Yu Tung, Antonio Torralba, Katerina Fragkiadaki, Chuang Gan
ICLR 2023, spotlight
webpage
Analogy-Forming Transformers for Few-Shot 3D Parsing
Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki
ICLR 2023
webpage
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds
Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki
ECCV 2022
webpage
TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki
ECCV 2022
webpage
Particle Videos Revisited: Tracking Through Occlusions Using Point Trajectories
Adam W. Harley, Zhaoyuan Fang, Katerina Fragkiadaki
ECCV 2022, oral
webpage
Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views
Jingyun Yang*, Hsiao-Yu Fish Tung*, Yunchu Zhang*, Gaurav Pathak, Ashwini Pokle, Christopher G Atkeson, Katerina Fragkiadaki
CoRL 2021
webpage
Disentangling 3D Prototypical Networks for Few-Shot Concept Learning
Mihir Prabhudesai*, Shamit Lal*, Darshan Patil*, Hsiao-Yu Tung, Adam Harley, Katerina Fragkiadaki
ICLR 2021
webpage
Track, Check, Repeat: An EM Approach to Unsupervised Tracking
Adam W. Harley, Yiming Zuo, Jing Wen, Ayush Mangal, Shubhankar Potdar, Ritwick Chaudhry, Katerina Fragkiadaki
CVPR 2021
webpage
Move to See Better: Self-Improving Embodied Object Detection
Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki
BMVC 2021
webpage
HyperDynamics: Generating Expert Dynamics Models by Observation
Zhou Xian, Shamit Lal, Hsiao-Yu Tung, Emmanouil Antonios Platanios, Katerina Fragkiadaki
ICLR 2021
webpage
CoCoNets: Continuous Contrastive 3D Scene Representations
Shamit Lal, Mihir Prabhudesai, Ishita Mediratta, Adam W. Harley, Katerina Fragkiadaki
CVPR 2021
webpage
Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping
Adam W. Harley, Shrinidhi K. Lakshmikanth, Paul Schydlo, Katerina Fragkiadaki
ECCV 2020
Embodied Language Grounding with Implicit 3D Visual Feature Representations
Mihir Prabhudesai*, Hsiao-Yu Fish Tung*, Syed Ashar Javed*, Maximilian Sieb, Adam W. Harley, Katerina Fragkiadaki
CVPR 2020
webpage
Epipolar Transformers
Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu
CVPR 2020
webpage
Graph-structured Visual Imitation
Xian Zhou*, Max Sieb*, Audrey Huang, Oliver Kroemer, Katerina Fragkiadaki
CoRL 2019, spotlight
webpage
Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping
Adam W. Harley, Fangyu Li, Shrinidhi K. Lakshmikanth, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki
ICLR 2020
webpage
Learning Spatial Common Sense with Geometry-Aware Recurrent Networks
Hsiao-Yu Fish Tung, Ricson Cheng, Katerina Fragkiadaki
CVPR 2019, oral
webpage
Model Learning for Look-ahead Exploration in Continuous Control
Arpit Agarwal, Katharina Muelling and Katerina Fragkiadaki
AAAI 2019 oral
webpage
Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions
Ricson Cheng, Arpit Agarwal, and Katerina Fragkiadaki
CoRL 2018
slides | code
Geometry-Aware Recurrent Neural Networks for Active Visual Recognition
Ricson Cheng, Ziyan Wang, and Katerina Fragkiadaki
NIPS 2018
Reward Learning from Narrated Demonstrations
Fish Tung, Adam Harley, Liang-Kang Huang, Katerina Fragkiadaki
CVPR 2018
bibtex
Depth-adaptive Computational Policies for Efficient Visual Tracking
Chris Ying, Katerina Fragkiadaki
EMMCVPR 2017
bibtex
Self-supervised Learning of Motion Capture
Hsiao-Yu Fish Tung, Wei Tung, Ersin Yumer, Katerina Fragkiadaki
NIPS 2017 spotlight
bibtex | code
Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision
Hsiao-Yu Fish Tung, Adam Harley, William Seto, Katerina Fragkiadaki
ICCV 2017
bibtex | code
SfM-Net: Learning of Structure and Motion from Video
Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki
arxiv
Learning Predictive Visual Models of Physics for Playing Billiards
Katerina Fragkiadaki*, Pulkit Agrawal*, Sergey Levine, Jitendra Malik
ICLR 2016
webpage
Recurrent Network Models for Human Dynamics
Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik
ICCV 2015
webpage
Human Pose Estimation with Iterative Error Feedback
Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik
arXiv
webpage
Learning to Segment Moving Objects in Videos
Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik
CVPR 2015
poster | bibtex | webpage
Grouping-Based Low-Rank Video Completion and 3D Reconstruction
Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik
NIPS 2014
poster | bibtex | webpage
Two Granularity Tracking: Mediating Trajectory and Detection Graphs for Tracking under Occlusions
Katerina Fragkiadaki, Weiyu Zhang, Geng Zhang, Jianbo Shi
ECCV 2012
poster | bibtex | webpage