Pacaya
Pacaya is a library for joint modeling with graphical
models, structured factors, neural networks, and
hypergraphs. Structured factors allow us to encode
structural constraints such as for dependency parsing,
constituency parsing, etc. Neural factors are just factors
where the scores are computed by a neural network.
This library has been used extensively for NLP. Check out
Pacaya NLP for applications of this library to dependency parsing,
relation extraction, semantic role labeling, and more.
Please cite the thesis below if you use this library.
@thesis{gormley_graphical_2015,
location = {Baltimore, {MD}},
title = {Graphical Models with Structured Factors, Neural Factors, and Approximation-Aware Training},
institution = {Johns Hopkins University},
type = {phdthesis},
author = {Gormley, Matthew R.},
date = {2015}
}
Pacaya NLP
Pacaya NLP is a suite of Natural Language Processing (NLP)
tools built using Pacaya, a library for hybrid graphical
models and neural networks.
Optimize
This library for numerical optimization is used heavily
throughout Pacaya. However, it provides general interfaces
and has been used for a variety of other projects as
well. Some of the algorithms included in Optimize are
listed below.
- Stochastic Gradient Descent (SGD) with lots of the tricks from (Bottou, 2012; "Stochastic Gradient Tricks")
- SGD with forward-backward splitting (Duchi & Singer, 2009)
- Truncated gradient (Langford et al., 2008)
- AdaGrad with L1 or L2 regularization (Duchi et al., 2011)
- L-BFGS ported from Fortran to C to Java
Prim
Prim is a Java primitives library akin to GNU Trove. It
differs by focusing heavily on sparse representations of vectors and
matrices.
Code generation is used to enable easy creation of many different
classes specific to a primitive type or pair of primitive types. For
example, LongDoubleSortedVector is used as the "template" class for
automatic generation of IntDoubleSortedVector, IntIntSortedVector,
LongIntSortedVector, etc.
sigtest
Easy significance testing for NLP with the paired
permutation test or boostrap test. Includes built-in
support for common data formats (CoNLL-2000, CoNLL-2003,
CoNLL-X, PTB, SemEval-2010, etc.) and metrics (accuracy,
precision, recall, F1).