This page might not display correctly in your browser. Consider activating the θ view or plain view or using the latest version of Firefox.
This page might not display correctly in your browser. Consider using the mobile view.
click for contact info

Nathan Schneider

Home

Mug Shot

This website is out of date but retained for posterity. Awesome New Website

I study the semantics of natural language text. My research aims to address the questions: What is the nature of linguistic knowledge that participates in the communication of meaning? How can humans represent aspects of this knowledge computationally in data? How can algorithms recover it in new data for scientific inquiry or for natural language processing applications?

My work draws techniques and insights from statistical machine learning, descriptive linguistics, construction grammar, and cognitive linguistics. My dissertation was on lexical semantics. A research overview provides more detail. Here are some topics, with superscripts denoting languages other than English:

NER/lexical semantics(EACL’12)a (ACL’12)a (NAACL’13b)a (LREC’14a) (LREC’14b) (TACL’14) (NAACL’15) (LAW’15) (CoNLL’15)
morphology, POS, syntax(BLS’10)h (CSDL’10) (ACL’11) (NAACL’13a) (LAW’13)e,k,m (COLING’14) (EMNLP’14)
relational semantic parsing(NAACL’10) (SemEval’10) (IFNW’13) (CL’14) (SP’14a) (SP’14b) (SemEval’14) (ACL’15)
second language(BEA’13)
linguistic annotation(ACL’11) (EACL’12)a (ACL’12)a (NAACL’13a) (LAW’13)e,k,m (LAW’13) (LREC’14a) (LREC’14b) (EMNLP’14) (NAACL’15) (LAW’15) (LREC’16)
social web(CMU-Q’11)a,e (ACL’11) (EACL’12)a (NAACL’13a)
(NAACL’13b)a (LAW’13)e,k,m (LREC’14a) (TACL’14) (EMNLP’14) (NAACL’15)

News & presentations:

more...

This homepage is brought to you by the letter θ, a friend to linguists and statisticians alike.

Papers

 
  • Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Meredith Green, Abhijit Suresh, Kathryn Conger, Tim O’Gorman, and Martha Palmer (2016). A corpus of preposition supersenses. Linguistic Annotation Workshop. [paper] » Dataset to appear.
  • Hannah Rohde, Anna Dickinson, Nathan Schneider, Christopher N. L. Clark, Annie Louis, and Bonnie Webber (2016). Filling in the blanks in understanding discourse adverbials: consistency, conflict, and context-dependence in a crowdsourced elicitation task. Linguistic Annotation Workshop. [paper] [slides] » Dataset to appear.
  • Nathan Schneider, Dirk Hovy, Anders Johannsen, and Marine Carpuat (2016). SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM) DiMSUM @SemEval. [paper] »
  • Nora Hollenstein, Nathan Schneider, and Bonnie Webber (2016). Inconsistency detection in semantic annotation. LREC. [paper] [slides] »
  • Meghana Kshirsagar, Sam Thomson, Nathan Schneider, Jaime Carbonell, Noah A. Smith, and Chris Dyer (2015). Frame-semantic role labeling with heterogeneous annotations. ACL-IJCNLP. [paper] [slides] »
  • Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider, and Timothy Baldwin (2015). Big data small data, in domain out-of domain, known word unknown word: the impact of word representations on sequence labelling tasks. CoNLL. [paper] »
  • Nathan Schneider (2015). Struggling with English prepositional verbs. ICLC. [abstract] [slides]
  • Nathan Schneider and Noah A. Smith (2015). A corpus and model integrating multiword expressions and supersenses. NAACL-HLT. [paper] [slides] [video] »
  • Nathan Schneider, Vivek Srikumar, Jena D. Hwang, and Martha Palmer (2015). A hierarchy with, of, and for preposition supersenses. Linguistic Annotation Workshop. [paper] [poster] »
  • Lingpeng Kong, Nathan Schneider, Swabha Swayamdipta, Archna Bhatia, Chris Dyer, and Noah A. Smith (2014). A dependency parser for tweets. EMNLP. [paper] [slides] [video] »
  • Sam Thomson, Brendan O’Connor, Jeffrey Flanigan, David Bamman, Jesse Dodge, Swabha Swayamdipta, Nathan Schneider, Chris Dyer, and Noah A. Smith (2014). CMU: Arc-Factored, Discriminative Semantic Dependency Parsing. Task 8: Broad-Coverage Semantic Dependency Parsing @SemEval. [paper] [poster] »
  • Nathan Schneider, Emily Danchik, Chris Dyer, and Noah A. Smith (2014). Discriminative lexical semantic segmentation with gaps: running the MWE gamut. TACL 2(April):193−206. Presented at ACL 2014. [paper] [poster] « [abstract] [bib] [errata] [data and software]
  • Dipanjan Das, Desai Chen, André F. T. Martins, Nathan Schneider, and Noah A. Smith (2014). Frame-semantic parsing. Computational Linguistics 40(1):9–56. [paper] [publisher] « [abstract] [bib] [errata]
  • Archna Bhatia, Chu-Cheng Lin, Nathan Schneider, Yulia Tsvetkov, Fatima Talib Al-Raisi, Laleh Roostapour, Jordan Bender, Abhimanu Kumar, Lori Levin, Mandy Simons, and Chris Dyer (2014). Automatic classification of communicative functions of definiteness. COLING. [paper] [poster] »
  • Nathan Schneider, Spencer Onuffer, Nora Kazour, Emily Danchik, Michael T. Mordowanec, Henrietta Conrad, and Noah A. Smith (2014). Comprehensive annotation of multiword expressions in a social web corpus. LREC. [paper] »
  • Yulia Tsvetkov, Nathan Schneider, Dirk Hovy, Archna Bhatia, Manaal Faruqui, and Chris Dyer (2014). Augmenting English adjective senses with supersenses. LREC. [paper] »
  • Meghana Kshirsagar, Nathan Schneider, and Chris Dyer (2014). Leveraging heterogeneous data sources for relational semantic parsing. Workshop on Semantic Parsing. [extended abstract] [poster] »
  • Jeffrey Flanigan, Samuel Thomson, David Bamman, Jesse Dodge, Manaal Faruqui, Brendan O’Connor, Nathan Schneider, Swabha Swayamdipta, Chris Dyer, and Noah A. Smith (2014). Graph-based algorithms for semantic parsing. Workshop on Semantic Parsing.
  • Michael T. Mordowanec, Nathan Schneider, Chris Dyer, and Noah A. Smith (2014). Simplified dependency annotations with GFL-Web. ACL demo. [paper] [poster] »
  • Nathan Schneider, Brendan O’Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, and Jason Baldridge (2013). A framework for (under)specifying dependency syntax without overloading annotators. Linguistic Annotation Workshop. [paper] [extended version] »
  • Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider (2013). Abstract Meaning Representation for sembanking. Linguistic Annotation Workshop. [paper] [website] »
  • Nathan Schneider, Behrang Mohit, Chris Dyer, Kemal Oflazer, and Noah A. Smith (2013). Supersense tagging for Arabic: the MT-in-the-middle attack. NAACL-HLT. [paper] [slides] [video] »
  • Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith (2013). Improved part-of-speech tagging for online conversational text with word clusters. NAACL-HLT. [paper] [summary slide] [poster] »
  • Yulia Tsvetkov, Naama Twitto, Nathan Schneider, Noam Ordan, Manaal Faruqui, Victor Chahuneau, Shuly Wintner, and Chris Dyer (2013). Identifying the L1 of non-native writers: the CMU-Haifa system. NLI Shared Task @BEA. [paper] [poster] »
  • Nathan Schneider, Chris Dyer, and Noah A. Smith (2013). Exploiting and expanding corpus resources for frame-semantic parsing. IFNW. [slides]
  • Nathan Schneider, Behrang Mohit, Kemal Oflazer, and Noah A. Smith (2012). Coarse lexical semantic annotation with supersenses: an Arabic case study. ACL. [paper] »
  • Behrang Mohit, Nathan Schneider, Rishav Bhowmick, Kemal Oflazer, and Noah A. Smith (2012). Recall-oriented learning of named entities in Arabic Wikipedia. EACL. [paper] [supplement] »
  • Kevin Gimpel, Nathan Schneider, Brendan O’Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith (2011). Part-of-speech tagging for Twitter: annotation, features, and experiments. ACL-HLT. [paper] [slides] »
  • Desai Chen, Nathan Schneider, Dipanjan Das, and Noah A. Smith (2010). SEMAFOR: Frame argument resolution with log-linear models. Task 10: Linking Events and their Participants in Discourse @SemEval. [paper] [slides] »
  • Dipanjan Das, Nathan Schneider, Desai Chen, and Noah A. Smith (2010). Probabilistic frame-semantic parsing. NAACL-HLT. [paper] [slides] »
  • Nathan Schneider (2010). English morphology in construction grammar. CSDL. [poster]
  • Nathan Schneider (2010). Computational cognitive morphosemantics: modeling morphological compositionality in Hebrew verbs with Embodied Construction Grammar. BLS. [slides] [paper] »
  • Nathan Schneider, Jeffrey Flanigan, and Tim O’Gorman (2015). The logic of AMR: practical, unified, graph-based sentence semantics for NLP. NAACL-HLT tutorial. [abstract] [materials] [video] »
  • Collin Baker, Nathan Schneider, Miriam R. L. Petruck, and Michael Ellsworth (2015). Getting the roles right: using FrameNet in NLP. NAACL-HLT tutorial. [abstract] [video] »
  • Nathan Schneider (2015). What I’ve learned about annotating informal text (and why you shouldn’t take my word for it). Linguistic Annotation Workshop. [paper] »
  • Nathan Schneider and Reut Tsarfaty (June 2013). Design Patterns in Fluid Construction Grammar, Luc Steels (editor). Computational Linguistics 39(2). [paper] »
  • Nathan Schneider and Omri Abend (31 January 2016). Towards a dataset for evaluating multiword predicate interpretation in context. PARSEME STSM Report. [paper] »
  • Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, and Nathan Schneider (September 2012). Part-of-speech tagging for Twitter: word clusters and other advances. Technical Report CMU-ML-12-107. [paper] »
  • Nathan Schneider (5 October 2011). Casting a wider ’Net: NLP for the Social Web. Invited talk, CMU Qatar Computer Science. [slides] »
  • Nathan Schneider, Rebecca Hwa, Philip Gianfortoni, Dipanjan Das, Michael Heilman, Alan W. Black, Frederick L. Crabbe, and Noah A. Smith (July 2010). Visualizing topical quotations over time to understand news discourse. Technical Report CMU-LTI-10-013. [paper] »
  • Dipanjan Das, Nathan Schneider, Desai Chen, and Noah A. Smith (April 2010). SEMAFOR 1.0: A probabilistic frame-semantic parser. Technical Report CMU-LTI-10-001. [paper] »
  • Reza Bosagh Zadeh and Nathan Schneider (December 2008). Unsupervised approaches to sequence tagging, morphology induction, and lexical resource acquisition. LS2 course literature review. [paper] [slides] »
Other reports to appear.

Research

My research interests are in the intersection of linguistics, cognitive science, and computer science/artificial intelligence. Fundamentally, I want to be able to describe and simulate human and artificial language learning, understanding, and use.

Computational cognitive linguistics and NLP lie at the intersection of cognitive science, computer science, linguistics, and artificial intelligence.

Reseach goals:

  • understanding how languages convey meaning
  • using computers to model, analyze, and reason about human language
  • designing computer interfaces to exploit aspects of human cognition and artificial intelligence, including language processing

Specific problems of interest include:

  • Statistical NLP: morphological, syntactic, and semantic parsing; machine translation; grammar learning; figurative language processing
  • Cognitive linguistics: Construction Grammar; frame semantics; metaphor and metonymy; conceptual blending and mental spaces; usage-based theories of language learning
  • Technology for linguistics: Use of technology to assist linguistic discovery and language revitalization
  • Human-computer interaction: NLP-enabled user interfaces and information visualization

The goal of this project was to build models to predict a sentence's frame-semantic structure. Predicting a frame-semantic parse involves finding and disambiguating frame-evoking expressions and matching roles of the evoked frames to arguments in the sentence. We have implemented a probabilistic frame parser for English which outperforms the previous state of the art.

The AQMAR project (a collaboration with CMU's Qatar campus) aims to advance the state of the art in NLP for Arabic text. We will develop tools for linguistic structure analysis, especially named entity recognition (NER) and semantic tagging, for use in the NLP community, with emphasis on domains other than news (namely, topics found in Arabic Wikipedia).

In 2008–2010 I worked on the RAVINE project, an effort combining NLP and information visualization technologies to build an interface facilitating efficient exploration and analysis of content from a large database of news articles. Our system scans articles to extract quotations (and their speakers) for display in an interactive graph. I have been primarily involved in designing the interface and in organizing a user study to evaluate its effectiveness.

Hebrew verbs use a root-and-pattern system, where a three-consonant root is lexicalized in one or more of seven verbal paradigms. Each verb, then, is a pairing of a root, a paradigm, and a meaning. An inflected verb's form is quite predictable, the meaning less so; many verbs have idiosyncratic meanings, but there are some regularities and tendencies which need to be accounted for, e.g. certain frequent alternations between paradigms for a common root. My analysis addresses the following questions:

  1. What are the forms and meanings of the morphological components of verbs—roots, paradigms, stems, and inflectional affixes?
  2. How do the forms and meanings of these constructions combine to yield actual verbs in sentences?
  3. How can these constructions be formalized in a structured representation that can be used for computational analysis?

I argue that construction grammar is an appropriate theoretical framework capable of accounting for the complexities of such a system. In particular, I use the Embodied Construction Grammar formalism to represent the necessary constructions in a manner suitable for automated analysis and simulation. Moreover, I argue that many features of the system are consistent with the notion of language as a best-fit cognitive phenomenon.

As part of an honors thesis under the supervision of Jerry Feldman, I designed a morphological extension to the Embodied Construction Grammar formalism and implemented this extension in the ECG parser.

Picurís Tagger

As a machine learning course project, in Fall 2007 I worked with fellow student Will Chang to develop a statistical model that would aid linguistic analysis of texts in Picurís, a Northern Tiwa language of New Mexico. A database of 28 stories in the language was compiled, and students in a recent linguistics course began the painstaking process of identifying the meanings of morphemes (meaning-bearing word fragments) in the texts.

Our model is a Hidden Markov Model over syllables; it predicts (a) the grouping of syllables within each word into morphemes (segmentation), and (b) a tag for each morpheme indicating its category/"part of speech" (classification). Trained with the EM algorithm, the model makes reasonable predictions with just a few labeled examples.

Metonymy Classification

For a course project in Spring 2007 I worked with Srini Narayanan on the problem of identifying whether a given verb was being used metonymically or not. I developed and tested a classifier for metonymic vs. literal sentences. Further work is needed in determining the semantic categories for a particular verb’s literal arguments.

Software

My research group has released several NLP tools, downloadable from the group web page. Of these, I contributed to the SEMAFOR frame-semantic parser for English (paper, tech report), the Twitter part-of-speech tagging tools (paper), and the Arabic named entity tagger (paper).
I have put together several tutorials/reference guides, including:

Education

In Spring 2016 I am co-teaching INFR09028: Foundations in Natural Language Processing (FNLP), which introduces 3rd-year Edinburgh Informatics undergraduates to statistical NLP. Sharon Goldwater is the other instructor.

During my Ph.D. at Carnegie Mellon University, I completed the following courses:

I served as the TA for:

In 2008 I graduated from the University of California, Berkeley with a double major in Computer Science and Linguistics. Courses included:

Computer Science

Linguistics

  • The Mind and Language
  • Advanced Cognitive Linguistics
  • Modern Hebrew Linguistics
  • Syntax and Semanatics
  • Comparative and Historical Linguistics
  • Phonology and Morphology
  • Phonetics
  • The Neural Basis of Thought and Language
  • Neural Theory of Language Seminar

Languages

  • עיברית מודרנית (Modern Hebrew) – 4 semesters' worth
  • français (French) – 1 semester
  • العربِيّة (Arabic) – 1 semester

Other

I attended Sycamore High School in Cincinnati, Ohio, where I studied computer science for four years, Modern Hebrew for three, and participated in the orchestra (violin), spring musicals (stage crew), academic quiz team, world affairs council, and Scrabble Club.

Potpourri

Academic

Programming

My programming languages of choice are Python and Java; I've also used C, C++, C#, and Scheme. For the web I use JavaScript and PHP.

Extracurricular

I play violin in the All University Orchestra. I also enjoy table tennis.

As an undergraduate, I was involved with several student groups: the Cal Berkeley Democrats & Smart Ass, the Roosevelt Institution (now Roosevelt Institute Campus Network), the Cal Scrabble Club, and SLUgS.

My favorite fonts include: Zapf Humanist/Optima, Segoe UI, Perspective Sans, Georgia, and Lucida Bright.

Old resources for Windows XP: How to type in Hebrew on Windows, IPA keyboard layout for Windows

Random Unicode character:

I have listed what to me are indispensible enhancements to the Firefox browsing experience, including:

Adblock Plus
Hide ads on web pages
FireGestures
Enables simple mouse movements for common browser actions
Firebug
Invaluable tool for web designers; includes a DOM browser, style information, and a JavaScript console
Zotero
This citation manager is priceless (good thing it's free!)—it imports citation information from online catalogs and electronic journals, stores snapshots of web pages, organizes notes and other metadata, and generates bibliographies/exports to BibTeX. I contributed a script which makes papers on the ACL Anthology visible to Zotero.
view: θ blinds mobile plain
updated 18 august 2016