There are 3 files:
ActiveLearner.java - This is the core class. The bootstrapper using
our module would make an object of this type and use the public methods
inside.
Question.java - The bootstrapper or the UI module can request for
questions to ask the user fro the ActiveLearner. Question is defined for the
object that gets passed between the two. Originally, it contains the
question to ask, but the UI module can pass it back to ActiveLearner with
the answer in the object.
Main.java - This is an illustration of how a bootstrapper would use
an ActiveLearner object.
Preferred Terminology/Nomenclature
Rosie Jones's thesis contains a considerable amount of material on active learning for bootstrap learning (Ch.4). However, it only addresses the setting in which the pool of unlabeled data does not change. However, this is not true for a bootstrap information extraction setting like the RtW system, and we may observe different behaviors using the same active learning or scoring schemes. For example, Jon and Jaime's boostrapper using Jones's CO-EM-ish scoring scheme may never converge. Hence we'd like to investigate what kinds of active learning algorithms work well in the changing environment. Also, we would like to further analyze and formalize claims made by Jones on which active learning algorithms/heuristics work well in which settings. In particular, we hope to generalize the settings from "identifying location noun phrases (NP)" or "identifying people NPs" to something more applicable to our information extraction environment (which hopefully encompasses, or at least overlaps with, entity extraction tasks, which was the focus of Jones's study). (Sadly many things are vague in that statement. For example, what are the settings that are applicable to our IE environment that generalizes Jones's? Hopefully things like that will be answered as we analyze our algorithms (which again, are yet to be defined).)
Eventually it'd be nice to be able to say, to do well, or to live long, 1. we can't live without active learning 2. we need ->||<- this much active learning, or 3. we don't need active learning at all. Of course active learning will help, that probably won't be a very interesting thing to say.
Date | RtW Goals | Our Goals |
4/6-4/13 |
|
|
4/13-4/20 |
|
|
4/20-4/27 |
|
|
4/27-5/4 |
|
|
5/4-5/11 |
|
|