Read The Web - Team Projects

Spring 2006



In class on Jan 26 student teams will present 5 minute proposals describing some NLP learning task for which they will implement a self-supervised learning algorithm. 

Project purpose:  The goal of these projects is for us to explore a variety of NLP learning tasks where redundancy can be used to support self-supervised, "bootstrap" learning.  Therefore, it is best if teams explore different approaches, taking advantage of different types of redundancy.

Below are some possible learning tasks to consider.  PLEASE NOTE THIS IS ONLY TO SPUR YOUR OWN THINKING - FEEL FREE TO SUGGEST A DIFFERENT PROJECT THAT YOU FIND INTERESTING.  Be creative!

Some possible tasks:

What type of information should you try to extract? 

For this project, you can do most anything -- make this choice based on what you think will offer plenty of redundancy to support self-supervised learning.  However, if possible try to focus on extracting beliefs about universities, their people, departments, activities, publications, conferences, research, etc.   After these first projects we are likely to focus on CS and Biology departments (the former because it's fun and we understand them, the latter because we might be able to take advantage of other ongoing research at CMU, ontologies such as Mesh, and other resources such as Medline publications). 

What should be in your proposal presentation?

Describe very specifically the type of knowledge you want your system to learn and the types of beliefs to be learned (e.g.,, We intend to learn extraction rules of the following two types:
to extract the class of beliefs
List the various redundant sources of information/inference that you intend to rely on.  (e.g., We will use
Try to prepare by 'hand simulating' your algorithm on some simple examples and try to summarize your experience (perhaps you can show a simple sequence of web pages visited, and rules/beliefs created using your hand simulation).

Presentation logistics:  Send tom.mitchell@cs.cmu.edu your presentation (powerpoint, pdf, or ps) by noon on thursday Jan 26.  He will assemble all presentations on his laptop in advance to save time.  Appoint a single speaker for your group, and have her/him practice the presentation in front of the group.  Be sure you can present your ideas in 5 minutes - avoid spending a lot of time on motivation - get directly to the specifics.  Keep in mind the point of your presentation is to stimulate the class into providing their own ideas to help your team.
 



This page is located in the file /afs/cs/project/theo-21/www/project1ideas.html. 
It is writable by any member of the course.
It was created using NVU, freely available at http://www.nvu.com/
Tom Mitchell, January 21, 2006.