TL;DR
- There are many great people that you should apply to work with if you want to do a PhD in database systems. Don't just look at the rankings.
My CMU DB squad is getting huge. In the last year, I have had dozens of students, visitors, interns, and random roustabouts that spent time to help build our new DBMS. One thing that I have learned since I became a professor is that when you have the initial meeting with someone that wants to join your research team you should ask them two things.
The first is what do they hope to get out of the experience. Everyone usually tells me that they want to work on a large systems project. Sometimes they say that they want to work on a specific component. This allows me to calibrate my expectations for the person and to tailor their assignments accordingly.
The second question is what do they want to do after they leave CMU. It's important to know their plans are early on because there are certain tasks that I can assign them that are better for someone that wants to apply for a PhD program. For example, there are things that could lead to a paper. But if the student doesn't tell me this until after we have been working together for a few months, then there might not be enough time for them to pivot before they have to apply.
As we get closer to the PhD admissions season (CMU's early deadline is Dec 1st), I sit down with the students that want to apply for PhD programs and go over a map of the United States and talk about the people that I like at other schools. I obviously cannot take on all of these people as my students, so I have told them all to strongly consider places other than CMU. But this year I am writing recommendation letters in December for probably 10 different students. So I figure it is easier if I just write down my list. And since it may be helpful for others, I am making it available here.
Thus, the following is my list of database groups in the US that I am encouraging my students to apply to in 2016 (other than CMU). Note that I don't know whether these professors are taking new students. I'm just describing what aspects of their research projects that I find interesting or why I like them as a person.
Disclaimers:
This list is not exhaustive. I hope that nobody feels slighted if I forgot to include them here. There really isn't a set criteria of who I choose to include in my list. My only judgement call was whether a person recently wrote a paper that I admire and that I wish that I wrote.
There are other professors doing novel database research in non-systems areas, such as Aditya Parameswaran (UIUC), Arnab Nandi (OSU), Jun Yang (Duke), Eugene Wu (Columbia), and Dan Suciu (UW). I am only listing the ones that are doing work that is directly related to my own research.
There are newer DB professors that recently started, but I'm not including them because I don't know what their new work is going to be about: Arun Kumar (UCSD), Jennie Duggan (Northwestern), Joey Gonzalez (Berkeley), Peter Alvaro (UCSC), and Paolo Papotti (ASU). I guarantee you that they are looking for new students.
There are also awesome people doing good systems research but do not typically publish in the same conferences that I do. Notable examples include Dave Andersen (CMU), Eddie Kohler (Harvard), Martha Kim (Columbia), Raluca Ada Popa (Berkeley), and Steven Swanson (UCSD). Again, I am only listing my SIGMOD/VLDB fam here.
Lastly, I am only including schools in the United States. I'm going to die in the US, so I think that this is fair. If you don't want to live in the US, then you should consider applying to work with Ken Salem (Waterloo), Natassa Ailamaki (EPFL), Thomas Neumann (TUM), or Gustavo Alonso (ETH).
The List (2016)
- Berkeley — The DB group at Berkeley is legendary. It's where Stonebraker created Ingres and Postgres. Margo Seltzer wrote BerkeleyDB in the 1990s. And more recently they created Spark (although it came from more than just DB people). The #1 stunna at Berkeley is Joe Hellerstein. My favorite thing about Joe's approach to research is that he methodically studies fundamental problems in data management that transcend always improving hardware or application trends. For example, I'm excited about his latest work on the Ground project for large-scale data provenance. Joe seems to have this effortless way of speaking about technical things that I wish I could emulate.
- Brown — I love this place. My time in grad school at Brown was probably the happiest six years of my life. It's a smaller department, so it may not be for everyone. But the people there are amazing. Their DB squad is large now too. Tim Kraska and Carsten Binnig are the youngest and most active members there. Tim has been working on database techniques for visualization systems as well as using RDMA to speed up the transaction processing. I strongly recommend working with him if you can. Ugur Cetintemel is the ever congenial professor that has been involved in all of the stream processing projects at Brown for the last decade (but he is also currently the department chair). And then finally there is Stan Zdonik. He was my advisor (with Mike). He is fantastic. I don't know whether he is taking on new students though.
- Chicago — For a long time, Chicago's CS department was in the doldrums. They are now getting serious about rebuilding their program. Aaron Elmore has worked with me in the past on automated load balancing techniques for distributed databases. Chicago also just hired Mike Franklin, but he is their new department chair so he may not be taking on any students right away.
- Columbia — The leading DB systems professor at Columbia is Ken Ross. His research is a bit closer to the hardware than my own work, but one of my students has started looking into the kinds of hardware acceleration (SIMD, GPU) that Ken has been studying for years.
- Duke — The Duke DB group is quite large. The main person there that is working in my area is Shivnath Babu (albeit he is a bit more algorithmicy than I am). Shivnath's work is on auto-tuning and self-managing methods for database systems. But I don't know whether he is back at Duke yet from working on his start-up.
- Harvard — Like Chicago, Harvard is about to begin expanding their CS department. But they're going to do it with some serious money. Margo Seltzer has certainly had a great impact on the databases in the 1990s, but nowadays she publishes in a lot of other non-DB venues for filesystems, storage, and data provenance. Thus, the key person there that is doing DB systems research is Stratos Ideros. Let me first say that Stratos is one of the most chill person that I have ever met. While I am stressing out how to build our new DBMS and pay for all of my students, he does not seemed to be fazed by the pressure at all. Despite his outward demeanor, he and his team is cranking out research at an amazing rate. His main research project is on "self-designing" systems (note that this is different than a "self-driving" system, as I will make clear at a later point).
- Maryland — I don't know whether it's public yet, but Dan Abadi is leaving Yale for UMD. Dan has had success with two strong systems projects in the last five years that have had a significant impact in the research community: HadoopDB (which later became the basis for his start-up Hadapt) and Calvin. Amol Deshpande is also at College Park as well. One of the things that Amol is looking into lately is how to apply in-memory database techniques to graph processing systems. Amol is super nice and I think would be a great advisor.
- Michigan — There are three database professors in Ann Arbor. The first is Mike Cafarella. This man has one of the smoothest baritone speaking voices in all of databases. Plus he lives life to the fullest. He is somebody that I aspire to be (except that I don't have the same "death drive" that he does). Like Chris Re and Joey Gonzalez, Mike spans both the systems and machine learning communities. Barzan Mozafari is doing the same kind of self-tuning stuff that I am interested in but he takes a stronger mathy/analytical approach to his work (whereas we just use machine learning magic). Barzan is also involved in start-up based on his work on BlinkDB. Lastly, there is H.V. Jagadish. Jag is one of the more senior people listed here and he works on a lot of different topics.
- MIT — I think that Sam Madden is one of the smartest people that I know. He has this ability to sit in a research meeting working away on his laptop while other people talk (seemingly to be not paying attention at all). Then he pops up and ask a really poignant question or makes a comment that solves some problem that the group was stuck on. He takes a wide view on his research and is involved in many different areas. And then there is Mike Stonebraker. I don't think there is anything I can say here about Mike that hasn't already been said by others. So I will let Michael Jackson do the talking for me.
- OSU — I have some qualms with living in Ohio. I used to date a girl from there. But don't let my personal hang-ups from preventing you from working with Spyros Blanas. He has been looking into building super optimized OLTP execution engines.
- Penn — My favorite person at Penn is Zack Ives. I remember when I was figuring out where to go to for grad school, he came into the department on Saturday morning to meet with me during their visit weekend. This has stuck with me over the years and I still hold Zack in high regard. His recent work on incremental query optimization is really interesting.
- Stanford — Technically the Stanford DB group is the InfoLab. Yes, it's an old group but they have new blood. They already get a lot of publicity so I don't feel like I need to say too much about them. I really like how Chris Re is actually helping people with his DeepDive project but then he is also building a modern DBMS engine backed by new theory (EmptyHeaded). Then there is Peter Bailis (aka the "Jar Jar Binks of Databases"). He is building a modern stream processing engine for IoT called MacroBase.
- UCLA — One problem with UCLA is that their department's directory lists a bunch of retired faculty members or people that are no longer there. Ignore that. Tyson Condie joined their department in 2013 and he working on techniques for optimizing Spark queries. You will want to work with Tyson. He is thoughtful and super smart.
- Washington — So many great people at UW. The campus is amazing too. Magda Balazinska and Bill Howe are building a new DBMS from scratch called Myria. This is an impressive feat and something that I am trying to accomplish in my own group. Alvin Cheung is applying programming language techniques to database systems.
- Wisconsin — Like Berekely, Wisconsin has historically had a very strong database group. Their alumni list is incredible. One of the best DB systems researchers in the world is Jignesh Patel. I consider the Quickstep project to be one of the most state-of-the-art execution engines in existence now. Jignesh is also a well-known streetfighter from his days back in India.