This research seminar is intended for Ph.D. students in Heinz College, the Machine Learning Department, and other university departments who wish to engage in detailed exploration of a specific topic at the intersection of machine learning and public policy. Qualified master's students may also enroll with permission of the instructor; all students are expected to have some prior background in machine learning and/or artificial intelligence (10-601, 10-701, 90-866, 90-904/10-830, or a similar course). This year's course will focus on the topic of Machine Learning for the Developing World. We will explore the potential contributions of machine learning technologies for development, including potential impacts on healthcare, education, agriculture, finance, communications, and governance. Machine learning can be used to analyze existing data and assist with the targeted collection of new data to drive policy analysis, and can be incorporated into deployed information systems to improve the effectiveness of public services. However, application of machine learning to the developing world faces a number of challenges (e.g. sparsity and low quality of data) as well as many opportunities for the development of new methods and incorporation of new data sources. We will explore these challenges and opportunities in detail through lectures, discussions on current research articles and future directions, and course projects, with the goals of understanding and advancing the current state of the art.
Six of the fourteen course meetings will be devoted to discussion of specific topics in Machine Learning for the Developing World. Each student is expected to give a high quality, twenty-minute PowerPoint presentation, followed by twenty minutes of class discussion, at one of these six meetings. The primary goals of each topic presentation should be to 1) synthesize and summarize the current state of the art for the given topic, overviewing the major methodological approaches and open problems, 2) to briefly review at least two specific research articles and their relevance to the topic, and 3) to facilitate the remainder of the discussion by posing questions for discussion, preliminary conclusions, and ideas to explore. Topic discussions can either be focused on a particular application domain for ML in the developing world (e.g. agriculture, education, governance/corruption, poverty reduction, microfinance, human rights, public safety, health care, disease surveillance) or can focus on a particular methodological challenge or opportunity (e.g. data quality, active learning, crowdsourcing, pattern detection, causal structure learning, cell phone data) with relevance to multiple development-related applications. A set of suggested topics for these presentations has been provided in the syllabus below, but other topics can also be considered based on student interest.
To ensure that presentations will be useful and relevant for the class, each presenter should send the instructor a brief text outline of the main topics/points that their presentation will cover, and a proposed set of 1-2 electronically available research articles that the class should read, at least one week prior to the presentation. The assigned reading(s) could be a review paper on the topic, or a landmark work representing a major advance on the topic; it may or may not be one of the specific articles reviewed in the presentation. The instructor will provide feedback and suggestions, and will post the articles on Blackboard so that the class can read them in advance of the presentation. The "Resources" section of Blackboard provides links to some suggested readings (feel free to use some of these in your presentations as relevant, but you should also find additional readings) as well as proceedings of several major conferences/workshops in "Artificial Intelligence for Development" (AI-D), "Information and Communication Technologies and Development" (ICTD), and "Computer Science for Global Development" (CCC). You can also look through recent ML conference proceedings (KDD, ICML, AAAI, NIPS) and journals (MLJ, JMLR, JASA). For many specific topics, the instructor can suggest a few additional papers/sources to get you started, and online resources such as Citeseer and Google Scholar will also be helpful.
All students are expected to be involved in a research project relevant to ML for development, to make significant progress on this research over the duration of the course, and to produce a written document describing the project's background (including a description of any previous work by the student and related work by others), methods, results, and conclusions. You are encouraged (but not required) to work in groups of two students on this project (groups of three may also be allowed, but require the instructor's permission). The project should involve the analysis of data from the developing world (broadly construed; macro-level analyses across developed- and developing-world countries are acceptable as well) and should carefully consider the specific challenges and limitations of this data (please provided explicit discussion of these challenges in your report). This final report should also include a brief description of how the progress of the work and the student's future research directions have been influenced by the semester's discussions. Students will also be expected to give two brief presentations of their work to the class (at the beginning of the course, describing their proposed work, and at the end of the course, describing their completed work), and to submit a short (1-2 page) proposal, thus providing opportunities for their work to benefit from feedback both from the instructor and from the class. If desired, the course project can be part of the students' ongoing doctoral research (in which case the group's proposal should make it clear what specific aspect of this work will be addressed during the duration of the course), or can be a smaller-scale project specific to the course (some suggested project topics/datasets/links are provided in the "Resources" section). Note that the course project requirement can be waived for students auditing the course, but all students are expected to give a topic synthesis presentation and to be active participants in class discussions.
(T 3/15) Course Introduction
Introductions (be prepared to speak for two minutes each about your background and interests)
Discussion of the course syllabus (course structure, goals, topic synthesis presentations, course projects)
Brief lecture/discussion introducing ML for development
(H 3/17) Discussion: What are the Methodological and Practical Challenges for Machine Learning in the Developing World?
(T 3/22) Guest Mini-Lectures: Where Does Machine Learning Fit Into the Broader Development Picture?
(H 3/24) Discussion Topics 1-2: ML for Economic Development
ML for Macro-Level Analysis of Development Data and Setting of Aid Priorities (causal structure learning, prediction, variable importance)
Methods for Dealing With Low Data Quality (sparse, incomplete, noisy, and biased data)
(T 3/29) Project Proposal Presentations
Each group will present a short PowerPoint presentation on their proposed course project, leaving sufficient time for class discussion and suggestions, as well as turning in a short (1-2 page) proposal.
(H 3/31) Guest Mini-Lecture: Machine Learning for Human Rights
Literature survey of human rights data analysis and discussion of potential ML contributions?
(T 4/5) Discussion Topics 3-4: ML for Global Health
ML for Disease Outbreak Detection (overview of methods; challenges and opportunities for applying disease surveillance to the developing world)
Other Applications of ML in Improving Global Health
(H 4/7) Guest Mini-Lectures: Developing and Deploying Disease Surveillance Systems in the Developing World
(T 4/12) Discussion Topics 5-6: Opportunities for Using Cell Phone and Geospatial Data in the Developing World
Inferring Mobility Patterns and Societal Structure
Early Warning via Event and Pattern Detection
(H 4/14) Discussion Topics 7-8: Methods for Cost-Efficient Data Acquisition
Active Learning for Development
Crowdsourcing for Development
(T 4/19) Discussion Topics 9-10: ML Applications to Development
ML for Education and Communication (e.g. promoting literacy, enabling technology access)
ML for Improved Governance (e.g. monitoring corruption, public safety)
(H 4/21) Discussion Topics 11-12: More ML Applications to Development
ML for Poverty Reduction (e.g. microfinance)
ML for Agriculture ("Feeding the World")
(T 4/26) Final Project Presentations
Each group will give a short PowerPoint presentation on their course project. Please plan to speak for no more than ten minutes, and leave five minutes for class discussion.
(H 4/28) More Final Project Presentations and/or Wrap-Up Discussion; project reports due today!