Machine Learning Overview
Input: K-best List $\mathcal{L} = \{ T_1, T_2, ..., T_K \}$$\mathcal{L}$ comes from a parser that parses a sentence $S$$\mathcal{T}(S)$ is the correct tree of $S$
Output: Best tree $T^* = \arg\max_{T \in \mathcal{L}} F_1(\mathcal{T}(S), T)$In general $\mathcal{T}(S) \not\in \mathcal{L}$
Approach: Learn a scoring function $s(T_i)$Hopefully $s(T) > s(T')$ if $F_1(T) > F_1(T')$Return $\arg\max_{T \in \mathcal{L}} s(T)$ Feature Extractor
An excerpt from edu/berkeley/nlp/assignments/rerank/SimpleFeatureExtractor.javapublic class SimpleFeatureExtractor {
public int[] extractFeatures(KbestList kbestList, int idx,
Indexer<String> featureIndexer, boolean addFeaturesToIndexer) {
Tree<String> tree = kbestList.getKbestTrees().get(idx);
List<Integer> feats = new ArrayList<Integer>();
addFeature("Posn=" + idx, feats, featureIndexer, addFeaturesToIndexer);
int[] featsArr = new int[feats.size()];
for (int i = 0; i < feats.size(); i++) {
featsArr[i] = feats.get(i).intValue();
}
return featsArr;
}
private void addFeature(String feat, List<Integer> feats,
Indexer<String> featureIndexer, boolean addNew) {
if (addNew || featureIndexer.contains(feat)) {
feats.add(featureIndexer.addAndGetIndex(feat));
}
}
}
Tree Processing Helpers: SurfaceHeadFinder
From edu/berkeley/nlp/assignments/rerank/SurfaceHeadFinder.javapublic class SurfaceHeadFinder {
private Map<String,Boolean> searchDirections;
private Map<String,Set<String>> validHeads;
public SurfaceHeadFinder() {
this.searchDirections = new HashMap<String,Boolean>();
searchDirections.put("ADJP", true);
this.validHeads = new HashMap<String,Set<String>>();
validHeads.put("ADJP", new HashSet<String>(Arrays.asList(
new String[] { "NNS", "NN", "$", "JJ", ... })));
}
public int findHead(String label, List<String> preterminals) {
if (label.equals("NP")) {
return searchFindLastBefore(preterminals, true, npHeads, npBlockers);
} else if (label.equals("PRN")) {
return (preterminals.size() > 1 ? 1 : 0);
} else if (validHeads.containsKey(label)) {
return searchFindFirst(preterminals, true, validHeads.get(label));
} else {
return 0;
}
}
}
Input: List of pre-terminalsOuptut: Syntactic head of the list Training: Primal SVM with sub-gradient descent
Location: edu/berkeley/nlp/assignments/rerank
public class PrimalSubgradientSVMLearner<D> {
public PrimalSubgradientSVMLearner(double stepSize, double regConstant, int numFeatures) { ... }
public IntCounter train(IntCounter initWeights,
final LossAugmentedLinearModel<D> model, List<D> data, int iters) {
OnlineMinimizer minimizer = new AdagradMinimizer(stepSize, regConstant, iters);
return minimizer.minimize(objs, initWeights.toArray(numFeatures), true, null);
}
}
public class AdagradMinimizer implements OnlineMinimizer {
public IntCounter minimize(List<DifferentiableFunction> functions, double[] initial,
boolean verbose, Callback iterCallbackFunction) {
double[] result = new double[initial.length];
for (int i = 0; i < result.length; ++i) {
result[i] = lazyResult.getCount(i);
}
return IntCounter.wrapArray(result, result.length);
}
}
Training: Primal SVM with sub-gradient descent
LossAugmentedLinearModel
public interface LossAugmentedLinearModel<T> {
public class UpdateBundle {
public UpdateBundle(IntCounter goldFeatures,
IntCounter lossAugGuessFeatures, double lossOfGuess) {
this.goldFeatures = goldFeatures;
this.lossAugGuessFeatures = lossAugGuessFeatures;
this.lossOfGuess = lossOfGuess;
}
public final IntCounter goldFeatures;
public final IntCounter lossAugGuessFeatures;
public final double lossOfGuess;
}
public UpdateBundle getLossAugmentedUpdateBundle(T datum, IntCounter weights);
}
In getLossAugmentedUpdateBundle, you process datum and return an UpdateBundle object
What should T datum be?Something with KbestList.Read PrimalSubgradientSVMLearner.java to find out. Charniak and Johnson, ACL 2005
Coarse-to-fine $n$-best parsing and MaxEnt discriminative rerankingA comprehensive list of features:CoPar: conjunct parallelismCoLenPar: difference in #preterminals dominated in adj. conjunctsRightBranch: prefer right-branching treesHeavy: Tree node-base featuresNeighbours: node and non-terminals to its left, rightRules: local trees annotated with various informationNgrams: tuples of adj. children nodes of the same parentHeads: tuples of head-to-head dependenciesLexFunHeads: pairs of POS-tags of nodes' lex. head & func. headWProj: preterminals and their closest max. proj. ancestorsWord: lex. items and their POS-tagsHeadTree: local trees of the projections of preterminal node and these projections' siblingsNgramTree: subtrees rooted in the LCA of contiguous preterminal nodes Johnson and Ural, NAACL 2010
Reranking the Berkeley and Brown Parsers
Try more features. Found the followings to be important:HeadTree: mentionedHeads: mentionedSynSemHeads: same as LexFunHeadsRBContext: how much each subtree deviates from right-branchingInterpLogCondP: log conditional probability according to parserNNgram: parent and $n$-gram sequences of children categories Hall et al., ACL 2014
Less Grammar, More Features
Span Features:SpanBasic: identity of the first and last words of a spanSpanContext: words immediately preceding and following a span. Similar to NeighborsSpanSplitPoint: where does the split in CKY happen?SpanShape: whether words begin with upper/lowercase, digit, or punct. Focus on words at spans boundaries