Tutorial: Machine Learning for the Computational Humanities

Digital Humanities and Computer Science Colloquium/	David Bamman
TEI Conference	School of Computer Science
Northwestern University, Orrington Hotel (Bonbright Room)	Carnegie Mellon University
Friday, October 24, 2014	dbamman@cs.cmu.edu
1pm-5pm

Free and open to the public, but register to reserve a place by emailing your full name and institution (if applicable) to teiconference2014@gmail.com with the subject "Register | Machine Learning for the Computational Humanities Tutorial."

Slides

Machine learning is a branch of computer science that helps drive much of the exciting work in the computational corners of the humanities and social sciences; its methods underlie topic models, classifiers, clustering algorithms, syntactic parsers and named entity recognizers (among much more). A variety of tools like MALLET and Weka have made the application of machine learning techniques widespread, but it's easy to see them as black boxes; the goal of this tutorial is to break open these boxes and have a look inside.

We'll survey a range of existing methods in machine learning, and answer the following questions for each one:

What's the basic intuition behind it?
What assumptions does it make about the world (or the data)?
Why would we prefer this method over others?
What tools can we use to implement this method?
How might you use this method for research in the humanities?

Machine learning techniques that we'll cover include:

Topic modeling and other probabilistic graphical models
Classification methods (Logistic regression, Naive Bayes, CRFs, HMMs etc.)
Clustering (EM, K-means, hierarchical clustering)
Representation learning (including "deep learning")
Supervised vs. unsupervised learning

By the end of the tutorial, participants will be able to explain how each of these methods works from a high-level perspective, understand what is a good (and bad) time to apply each one, and know where to go for more information. No prior computational background is required. This tutorial is free and open to the public.

Bio

David Bamman is a PhD student in Computer Science at Carnegie Mellon University. His research applies natural language processing and machine learning to empirical questions in the humanities and social sciences, including modeling linguistic variation (ACL 2014, Journal of Sociolinguistics 2014), inferring character types in movie plot summaries (ACL 2013) and novels (ACL 2014), inferring social rank in an Old Assyrian trade network (DH 2013) and detecting censorship in Chinese social media (First Monday 2012). David designed and co-taught an interdisciplinary (English/Computer Science) course at CMU on "Digital Literary and Cultural Studies," for which he received Carnegie Mellon's 2014 Alan J. Perlis Graduate Student Teaching Award. Prior to CMU, David was a senior researcher at the Perseus Project.