|
William W. Cohen
|
Biography
William Cohen is a Visiting Professor at Carnegie Mellon University in
the Machine Learning Department.
He also holds a 20%-time appointment as a Principal Scientist at Google, where he
worked full-time between May 2018 and March 2024. He received his
bachelor's degree in Computer Science from
Duke University in 1984, and a PhD
in Computer Science from Rutgers
University in 1990. From 1990 to 2000 Dr. Cohen worked at
AT&T Bell Labs and
later AT&T Labs-Research,
and from April 2000 to May 2002 Dr. Cohen worked
at Whizbang Labs, a company
specializing in extracting information from the web. From 2002 to
2018, Dr. Cohen worked at Carnegie Mellon University in
the Machine Learning Department,
with a joint appointment in
the Language Technology
Institute.
Dr. Cohen is a past president of
the International Machine
Learning Society. In the past he has also served as an action
editor for the
the AI
and Machine Learning series of books published
by Morgan Claypool, for
the
journal Machine
Learning, the
journal Artificial
Intelligence, the Journal of
Machine Learning Research, and
the Journal of Artificial
Intelligence Research. He was General Chair for
the 2008 International
Machine Learning Conference, held July 6-9 at
the University of
Helsinki,
in Finland;
Program Co-Chair of
the 2006
International Machine Learning Conference; and Co-Chair of
the 1994
International Machine Learning Conference. Dr. Cohen was also the
co-Chair for the 3rd
Int'l AAAI Conference on Weblogs and Social Media, which was held
May 17-20, 2009 in San Jose, and was the co-Program Chair for
the 4rd Int'l AAAI
Conference on Weblogs and Social Media. He is
a AAAI
Fellow, and was a winner of the 2008
the SIGMOD
"Test of Time" Award for the most influential SIGMOD paper of
1998, the
2014 SIGIR
"Test of Time" Award for the most influential SIGIR paper of
2002-2004, and the 2023 Semantic Web Science
Association's Ten-Year
Award for the most influential paper of the ISWC-2013 conference.
Dr. Cohen's research interests include include question answering,
machine learning for NLP tasks, and neuro-symbolic reasoning, and he
has a long-standing interest in statistical relational learning. He
holds seven patents related to learning, discovery, information
retrieval, and data integration, and is the author of more than 300
publications.
Announcements and FAQs
- May 2024: A new edition of A Computer Scientist's Guide To
Biology will be out later this summer! More information and an
excerpt is available from
my co-author's
website. I'm possibly biased but I
think Charles Cohen did a
great job with the update - the book is still quite compact, but
pretty much the whole book has been rewritten and updated. For
example the new version includes several new chapters on topics
like CRISPR which weren't even a thing back in 2007.
- March 2024: As you can see from my updated bio above, I am have
returned to CMU's ML department full-time (although I still have a
20% involvement a Google, so that email will work!) I'm really
looking forward to re-engaging with my friends at colleagues at CMU.
Projects, Publications, Software, Datasets, and Talks
These are now being distributed from my Github page.
Past courses:
- Spring 2018: Undergraduate Level Machine Learning with Large Datasets, 10-405, Mon-Wed 3:30-4:20 in GHC 4307
- Fall 2017: Machine Learning with Large Datasets, 10-605 and 10-805, Tues-Thus 1:30-2:50pm, PH 100.
- Fall
2016: Machine
Learning with Large Datasets, 10-605 and 10-805, Tues-Thus
1:30-2:50pm, Wean Hall 7500.
- Spring 2016: Machine Learning 10-601, Mon-Wed time 10:30-11:50am, GHC 4401, with Maria-Florina Balcan.
- Fall 2015: Machine Learning with Large Datasets, 10-605 and 10-805, Tu-Thu 4:30-5:50am in Dougherty Hall 2210.
- Spring 2015: Machine Learning with Large Datasets, 10-605 and 10-805, Tu-Thu 10:30-11:50am in BH A51
- Fall 2014: 10-601 Machine Learning, Tu-Thu 1:30-2:50, Wean 7500
- Spring 2014: 10-605 Machine Learning with Large Datasets, Mon-Wed 1:30-2:50, Dougherty Hall 1112
- Fall 2013: 10-601 Machine Learning, Mon-Wed 4:30-5:50, Doherty Hall 2315 (with Eric Xing).
- Spring 2013: Machine Learning with Large Datasets, Mon-Wed 1:30-2:50, 4307 GHC
- Fall 2012: ML 10-802 and LTI 11-772 (Analysis of Social Media), 10:30-11:50pm Tues & Thus, 4303 Gates Building.
- Fall 2012: 10-915, the MLD Journal Club, 12-1:20pm Tue & Thu, 4101 Gates Building (with Roy Maxion).
- Spring 2012: Machine Learning with Large Datasets, Tues-Thurs 1:30-2:50pm, NSH 1305
- Fall 2011: Structured
Prediction for Language and Other Discrete Data (SPLODD-2011), ML
10-710 and LTI 11-763, Tues-Thursday 3:00-4:20 in Gates-Hillman 4211.
This is co-taught by myself and Noah Smith, and will include some
subjects from Information
Extraction and some from Language and Stats 2. A
machine learning course (10-701 or consent of the instructors) is a
prereq; we don't recommend that you take the course if you have
already taken Information Extraction or Language and Stats 2.
- Spring 2011: ML 10-802 and LTI 11-772 (Analysis of Social Media), 10:30-11:50pm Tues & Thus, 4303 Gates Building.
- Spring 2011: 10-915, the MLD Journal Club, 3-4pm Mon & Wed, 4101 Gates Building.
- Fall 2010: 10-707
(Information Extraction - cross-listed in LTI as 11-748),
1:30-2:50pm Mon & Wed, Gates 4101. The first class is 9/8, the
Wed after Labor Day, to allow incoming students time to attend the IC
courses.
- Spring 2010: 10-802 (Analysis of Social Media).
- Fall 2009: 10-707
(Information Extraction), 1:30-2:50pm Mon & Wed, 5222 Gates
Building.
- Spring 2008: 10-601 (Machine Learning)
with Tom Mitchell, on 3-4:30
Mon & Wed in Wean Hall 5409.
- Fall 2007: Analysis of Social
Media, Machine Learning 10-802 and LTI 11-772, with Natalie Glance
(of Google Pittsburgh) - a brand-new seminar course. 4:30-6:30
Tuesdays in Wean Hall 4623.
- Note: This site is the shattered remains of a once-beautiful wiki,
created by the students of 10-802, generously hosted for free by
ScribbleWiki, tragically lost (due
a combination of RAID drive failures and low-bidder backup schemes),
and then largely recovered using
Warrick
from various internel caches and archives.
- Fall 2007: Current Topics
in Computational Biology (Journal Club), 02-701. (Announcements). Thursdays from 4:00-5:00 in 411
Mellon Institute (after Cell & Systems Modeling).
- Spring 2007: Information Extraction, Machine
Learning 10-707 and LTI 11-748 - back by popular demand for the first time since 2004!
- Fall 2006: Current Topics in Computational Biology (Journal Club), 02-701.
(Announcements)
- Spring 2006: Read the Web, CALD 10-709.
- June 21,23,25, 2005: A mini-course on Minorthird. Materials are below.
- Slides, notes, and sample files from first
day's lecture.
- Slides, notes, and sample files from second
day's lecture.
- Powerpoint slides from third
day's lecture.
- Jar file for minorThird, if you
only want to run the code, not compile it or read it.
The installation process here is:
- Install Java 1.4 or higher (actually, JRE is all you need).
- Download the jar for minorThird
and stick it in some directory.
- Optionally, download the sample data
repository and unpack it into the same directory.
- Change to that same directory and
then run Minorthird with the command
java -Xmx500M -jar minorthird.jar
What will pop up will be a small launch pad that can be used to
start any of the UI programs. You can also start a particular
main by specifying minorthird.jar as your classpath, for
instance:
java -Xmx500M -cp minorthird.jar edu.cmu.minorthird.ui.Help
- If you want to do a real install here's the home page on Sourceforge, and
a document on how to do a CVS
install Minorthird.
- Spring 2004: "Learning to Turn Words into Data:
Machine Learning Approaches to Information Extraction and Information Integration", CALD 10-707 and LTI 11-748.
- Daniel Spokoyny, LTI PhD student, co-supervised with Taylor Berg-Kirkpatrick.
Long-term colleagues
Former students