Software
This software has been developed by my research group, me, and collaborators around the world.
- cnn - a flexible, relatively fast neural network toolkit optimized for text modeling.
- cdec - a fast, mature decoder, alignment, and modeling toolkit for statistical machine translation and similar structure-prediction problems.
- cpyp - a small, lightweight C++ Pitman–Yor process library
- HyperGREX - a simple, elegant but feature-filled rule extractor for a variety of syntactic translation formalisms (for use with cdec).
- The CMU cross-lingual metaphor detector - a toolkit for identifying instances of figurative language in English and any other language for which a bilingual dictionary is available.
fast_align
- a very fast—but pretty effective—unsupervised bilingual word aligner.- creg - a small and fast toolkit for large-scale linear, logistic, and ordinal regression modeling.
Data
- English adjective supersenses - a 13-class supersense taxonomy of English adjectives developed by Yulia Tsvetkov
- Korean-English Wikipedia Titles - a parallel corpus of Wikipedia titles from January 2012.
- Chinese-English place names - a parallel corpus of Chinese place names from Wikipedia.