Tutorial: Machine Learning for the Computational Humanities
| Digital Humanities and Computer Science Colloquium/ | David Bamman |
| TEI Conference | School of Computer Science |
| Northwestern University, Orrington Hotel (Bonbright Room) | Carnegie Mellon University |
| Friday, October 24, 2014 | dbamman@cs.cmu.edu |
| 1pm-5pm |
Free and open to the public, but register to reserve a place by emailing your full name and institution (if applicable) to teiconference2014@gmail.com with the subject "Register | Machine Learning for the Computational Humanities Tutorial."
Machine learning is a branch of computer science that helps drive much of the exciting work in the computational corners of the humanities and social sciences; its methods underlie topic models, classifiers, clustering algorithms, syntactic parsers and named entity recognizers (among much more). A variety of tools like MALLET and Weka have made the application of machine learning techniques widespread, but it's easy to see them as black boxes; the goal of this tutorial is to break open these boxes and have a look inside.
We'll survey a range of existing methods in machine learning, and answer the following questions for each one:
- What's the basic intuition behind it?
- What assumptions does it make about the world (or the data)?
- Why would we prefer this method over others?
- What tools can we use to implement this method?
- How might you use this method for research in the humanities?
Machine learning techniques that we'll cover include:
- Topic modeling and other probabilistic graphical models
- Classification methods (Logistic regression, Naive Bayes, CRFs, HMMs etc.)
- Clustering (EM, K-means, hierarchical clustering)
- Representation learning (including "deep learning")
- Supervised vs. unsupervised learning
By the end of the tutorial, participants will be able to explain how each of these methods works from a high-level perspective, understand what is a good (and bad) time to apply each one, and know where to go for more information. No prior computational background is required. This tutorial is free and open to the public.