Analysis of Social Media course
From ScribbleWiki: Analysis of Social Media
Contents |
Overview & Description
The class means Tuesday 4:30-6:30 in Wean Hall 4623. The instructors are William Cohen and Natalie Glance (Google Pittsburgh). The course is MLD 10-802 and also LTI 11-772.
The most actively growing part of the web is "social media"—e.g.. wikis, blogs, bboards, and collaboratively-developed community sites like Flikr and YouTube. This seminar course will review selected papers from the recent research literature that address the problem of analyzing and understanding social media. This will be a 6-credit course, with the primary workload being attending class and presenting material.
Topics that will be covered include:
- Text analysis techniques for sentiment analysis, analysis of figurative language, authorship attribution, and inference of demographic information about authors (e.g., age or sex).
- Community analysis techniques for detecting communities, predicting authority, assessing influence (e.g. in viral marketing), or detecting spam.
- Visualization techniques for understanding the interactions within and between communities.
- Learning techniques for modeling and predicting trends in social media, or predicting other properties of media (e.g., user-provided content tags.)
Students should have a machine learning course (e.g., 15-781 or 15-681) or consent of the instructor. The content of the course will be complimentary to another new course, “The Social Web: Content, Communities, and Context” (05-320/05-820) which is also being offered in fall 2007.
Course Projects
For those students that have elected to upgrade the course to a full 12-credit course and submit a course project:
- By 10/1, everyone should send the instructors a three-page writeup of your proposed project describing: the problem you are studying, the inputs and outputs of the method you plan to develop; the dataset you plan to use; and a short discussion of what techniques you plan to use.
- The final project will be due midnight EST on 12/13, and will be a paper, in the format used by ICWSM i.e., 8 pp 2-col conference paper format. (Unfortunately the ICWSM deadline is earlier than this, Dec 3, but so it goes).
Schedule
August and September
- Aug 28: Organizational meeting (William). Slides.
- Sep 4: Lecture on sentiment analysis (William). Slides.
Papers discussed: Turney, ACL 2001, Pang et al, EMNLP 2002, Wiebe et al, Computational Linguistics 2005
- Sep 11: Lecture on graph-based analysis (Natalie). Slides part 1; Slides part 2.
Papers discussed: Page et al, 1999
- Sep 18: Lecture on Slides.
Papers discussed: Cohn and Hoffman, NIPS 2001, Erosheva et al, PNAS 2004 , Rosen-Zvi et al, UAI, 2004, McCallum et al, IJCAI 2005, Dietz et al, ICML 2007 . Ramesh also suggested some background reading papers on PLSA, LDA, and topic models.
- Sep 25. Lecture on advanced topics in graph-based analysis (Guest lecture from Christos Faloutsos). Slides part 1; Slides part 2a;
Slides part 2b; Slides part 2c.
Papers discussed: Sun et al, KDD 2006, Wang et al, SRDS 2003, Chakrabarti et al, KDD 2004.
October
- Oct 2. More on graph-based analysis.
- William: local navigation in networks. Papers discussed: Travers & Millgram, Sociometry 1967; Kleinberg, STOC 2000; Liben-Nowell et al, PNAS 2005. Slides.
- Student 1: Mary McGlohon. Reading List. Slides.
- Oct 9. Spam in weblogs and social networks.
- Student 1: Moira Burke (Reading List) Slides
- Student 2: Jingrui He (Reading List) Slides
- William: Background on link-oriented spam detection methods. Papers discussed: Gyongyi and Garcia-Molina, 2004; Boykin and Roychowdhury, 2005; Gerecht et al, 2005; Kolari et al, 2006. Slides.
- Oct 16. Interested in meeting Matt while he's here?
- Oct 23. Viral marketing and the spread of influence.
- Student 1: Sameer (Reading List)
- Student 2: Xiaonan (Reading List)
- Student 3: Udhay Reading List
- Oct 30. Trend analysis.
- Student 1: Mohit Kumar
- Student 2: Yichia. Paper discussed:Paper
- Student 3: Hideki Shima (Reading List)
November
- Nov 6. Politics and social media.
- Student 1: Mahesh: Readings (Predicting political affiliation of users)
- Student 2: Emil Albright: readings TBA
- Student 3: Swapna - Paper: The political blogosphere ..., LA Adamic, N Glance [1], Predictive Opinions, Kim and Hovy,2007 [2]
- Nov 13. Community dynamics.
- Student 1: Shilpa
- Student 2: Hanghang Tong - Paper:[A] Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, Christos Faloutsos: Fully automatic cross-associations. KDD 2004: 79-88. [B] Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, Philip S. Yu:GraphScope: parameter-free mining of large time-evolving graphs. KDD 2007:687-696
- Student 3: Sachin
- Nov 20. Recommender systems & the TREC Blog Track.
- Student 1: Dipanjan
- Student 2: Mehrbod Sharifi
- Student 3: Jon Elsas: Trec Blog Track
- Nov 27. Design of online communities. (Guest lecture from Bob Kraut, HCII).
December
- Dec 4. Anonymity and privacy issues.
- Student 1: Yimeng
- Student 2: Justin
- Student 3: Jana
- Dec 11. Tagging and folksonomys.
- Student 1: open
- Student 2: open
- Student 3: open