* Agenda
- API
- Evaluation
1. Evaluation
a. Metrics
- Overall bootstrapping extraction accuracy (also dependent on how the
overall system uses our probabilities)
- Compare rule precision values returned by different probability/score
schemes. (In some domains, can compare to "ground truth".)
- Compare/evaluate example precision values returned by each
probability/score scheme versus "ground truth" that is us labeling.
(Probably faster with some threshold: Score -> {0.1})
- Given a rule-picking mechanism for the bootstrapper, compare extraction
volume across different prob/score schemes.
b. Probability Estimation
- No prob estmation
- Co-EM-ish thing (Jon & Jaime, Rosie Jones)
- PMI
- Noisy-or Model
- Pollution Network
- URNs
c. Active Learning Algorithms
Ideally, we would like to minimize, where Pi is the real precision of rule
i and P^i is the estimated precision of the rule. Since we don’t know what Pi should
be, here are some other measures that may be reasonable:
- Random (rather passive learning)
- Constant rule confidence value (e.g. 90%, all rules are good)
- e.g. (overall precision given the example label)]
d. Relations
- IsCity()
- IsNation()
2. API
Register Relations (r1, r2…) ->AL_object
AL_object ->AddExtractor(e, p_r1, p_r2..)
AL_object -> AddOccurrence(o, e)
AL_object -> GetExtractorProbability(e) ->(p1, p2,…)
AL_object -> GetEntityProbability(ent) -> (p1, p2..)
3. Terminology/Nomenclature
1. Relations/Predicate ~strings
2. Rules/Extraction Rule/Patterns/Extractors/Contexts ~string, left/right handside
3. Claim/Belief/Assertion
4. Occurrence/Extraction/Instance/Span ~string, spans
5. Entities/Concepts/Entity Pairs ~string