List of Accepted Papers
- List of papers to be presented (without the abstracts)
- List of accepted long papers
- List of accepted short papers
- TACL papers to be presented at EMNLP
Long papers
- Document Modeling with Convolutional-Gated Recurrent Neural Network for Sentiment Classification Duyu Tang, Bing Qin and Ting Liu
Document level sentiment classification remains a challenge: encoding the intrinsic relations between sentences in the semantic meaning of a document. To address this, we introduce the Convolutional-Gated Recurrent Neural Network (C-GRNN), which learns vector-based document representation in a unified, bottom-up fashion. C-GRNN first models sentence representation with convolutional neural network. Afterwards, semantics of sentences and their relations are adaptively encoded in document representation with gated recurrent neural network. We apply C-GRNN to document level sentiment classification and conduct experiments on four large-scale review datasets from IMDB and Yelp. Experimental results show that: (1) C-GRNN shows superior performances over several state-of-the-art algorithms; (2) gated recurrent neural network dramatically outperforms standard recurrent neural network in document modeling.
- System Combination for Multi-document Summarization Kai Hong and Ani Nenkova
We present a novel framework of system combination for multi-document summarization. For each input set (input), we generate candidate summaries by combining the summaries from different systems on the sentence level. We show that the oracle among these candidates is much better than the systems that we have combined. We then present a supervised model to select among the candidates. The model relies on a rich set of features that capture content importance from different perspectives. Our model performs better than the systems that we have combined, based on automatic and manual evaluations. Our model also achieves a performance comparable to the state-of-the-art on six DUC/TAC datasets.
- When Are Tree Structures Necessary for Deep Learning of Representations? Jiwei Li, Thang Luong, Dan Jurafsky and Eduard Hovy
Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up from parse tree children, are a popular new architecture, promising to capture structural properties like long-distance semantic dependencies. But understanding exactly which tasks this parse-based method is appropriate for remains an open question. In this paper we benchmark recursive neural models against sequential recurrent neural models, which are structured solely on word sequences. We investigate 5 tasks: sentiment classification at (1) sentence level (2) phrase level (3) matching questions to answer-phrases; (4) discourse parsing; (5) computing semantic relations (e.g., component-whole between nouns). We implement basic versions of recursive and recurrent models and apply them to each task. Our analysis suggests that syntactic tree-based recursive models are helpful for tasks that require representing long-distance relations between words (e.g., semantic relations between nominals), but may not be helpful in other situations, where sequence based recurrent models can produce equal performance. Our results offer insights on the design of neural architectures for representation learning.
- Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke and Steve Young
Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact both on usability and perceived quality. Most NLG systems in common use employ rules and heuristics and tend to generate rigid and stylised responses without the natural variation of human language. They are also not easily scaled to systems covering multiple domains and languages. This paper presents a statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure. The LSTM generator can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates. An objective evaluation in two differing test domains showed improved performance compared to previous methods with fewer heuristics. Human judges scored the LSTM system higher on informativeness and naturalness and overall preferred it to the other systems.
- Detecting Risks in the Banking System by Sentiment Analysis Clemens Nopp and Allan Hanbury
In November 2014, the European Central Bank (ECB) started to directly supervise the largest banks in the Eurozone via the Single Supervisory Mechanism (SSM). While supervisory risk assessments are usually based on quantitative data and surveys, this work explores whether sentiment analysis is capable of measuring a bank's attitude and opinions towards risk by analyzing text data. For realizing this study, a collection consisting of more than 500 CEO letters and outlook sections extracted from bank annual reports is built up. Based on these data, two distinct experiments are conducted. The evaluations find promising opportunities, but also limitations for risk sentiment analysis in banking supervision. At the level of individual banks, predictions are relatively inaccurate. In contrast, the analysis of aggregated figures revealed strong and significant correlations between uncertainty or negativity in textual disclosures and the quantitative risk indicator's future evolution. Risk sentiment analysis should therefore rather be used for macroprudential analyses than for assessments of individual banks.
- Cross Lingual Sentiment Analysis using Modified BRAE Sarthak Jain and Shashank Batra
Cross-Lingual Learning provides a mechanism to adapt NLP tools available for label rich languages to achieve similar tasks for label-scarce languages. An efficient cross-lingual tool significantly reduces the cost and effort required to manually annotate data. In this paper, we use the Recursive Autoencoder architecture to develop a Cross Lingual Sentiment Analysis tool using sentence aligned corpora between a pair of resource rich (English) and resource poor(Hindi) language. The resulting system is analyzed on a newly developed Movie Reviews Dataset in Hindi with labels given on a rating scale and compare performance of our system against existing systems. It is shown that our approach significantly outperforms state of the art systems for Sentiment Analysis, especially when labeled data is scarce.
- Hashtag Recommendation Using Dirichlet Process Mixture Models Incorporating Types of Hashtags Qi Zhang, Yeyun Gong and Xuanjing Huang
In recent years, the task of recommending hashtags for microblogs has been given increasing attention. Various methods have been proposed to study the problem from different aspects. However, most of the recent studies have not considered the differences in the types or uses of hashtags. In this paper, we introduce a novel nonparametric Bayesian method for this task. Based on the Dirichlet Process Mixture Models (DPMM), we incorporate the type of hashtag as a hidden variable. The results of experiments on the data collected from a real world microblogging service demonstrate that the proposed method outperforms state-of-the-art methods that do not consider these aspects. With taking these aspects into consideration, the relative improvement of the proposed method over the state-of-the-art methods is around 12.2% in F1-score.
- ERSOM: A Structural Ontology Matching Approach Using Automatically Learned Entity Representation Chuncheng Xiang, Baobao Chang and Zhifang Sui
As a key representation model of knowledge, ontology has been widely used in a lot of NLP related tasks, such as semantic parsing, information extraction and text mining etc. In this paper, we study the task of ontology matching, which concentrates on finding semantically related entities between different ontologies that describe the same domain, to solve the semantic heterogeneity problem. Previous works exploit different kinds of descriptions of an entity in ontology directly and separately to find the correspondences without considering the higher level correlations between the descriptions. Besides, the structural information of ontology haven't been utilized adequately for ontology matching. We propose in this paper an ontology matching approach, named ERSOM, which mainly includes an unsupervised representation learning method based on the deep neural networks to learn the general representation of the entities and an iterative similarity propagation method that takes advantage of more abundant structure information of the ontology to discover more mappings.
- How Much Information Does a Human Translator Add to the Original? Barret Zoph, Marjan Ghazvininejad and Kevin Knight
We ask how much information a human translator adds to an original text, and we provide a bound. We address this question in the context of bilingual text compression: given a source text, how many bits of additional information are required to specify the target text produced by a human translator? We develop new compression algorithms and establish a benchmark task.
- Biography-Dependent Collaborative Entity Archiving for Slot Filling Yu Hong, Xiaobin Wang, Yadong Chen, Jian Wang, Tongtao Zhang and Heng Ji
Current studies on Knowledge Base Population (KBP) tasks, such as slot filling, show the particular importance of entity-oriented automatic relevant document acquisition. Richer, diverse and reliable relevant documents satisfy the fundamental requirement that a KBP system explores the attributes of an entity, such as provenance-based background knowledge extraction (e.g., a person's religion, origin, etc.). Towards the bottleneck problem between comprehensiveness and definiteness of acquisition, we propose a fuzzy-to-exact matching based collaborative archiving method. In particular we introduce topic modeling methodologies into entity profiling, so as to build a bridge between fuzzy and exact matching. On one side of the bridge, we employ the topics in a small-scale set of high-quality relevant documents (i.e., exact matching results) to summarize the life slices of a target entity (i.e., so-called biography). On the other side, we use the biography as a reliable reference material to detect new truly relevant documents from a large-scale semi-finished pseudo-feedback (i.e., fuzzy matching results). We leverage the archiving method in state-of-the-art slot filling systems. Experiments on TAC-KBP data show significant improvement.
- Monotone Submodularity in Opinion Summaries Jayanth Jayanth, Jayaprakash Sundararaj and Pushpak Bhattacharyya
We propose a set of submodular functions for opinion summarization. Opinion summarization has built in it the tasks of summarization and sentiment detection. However, it is not easy to detect sentiment and simultaneously extract summary. The two tasks conflict in the sense that the demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. However, using submodularity we show how to strike a balance between the two requirements. We investigate a new class of submodular functions for the problem, and a partial enumeration based greedy algorithm that has performance guarantee of 63%. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score, which outperforms the-state-of-the-art algorithms.
- Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P Steffen Eger
We investigate the need for bigram alignment models and the benefit of supervised alignment techniques in G2P. Moreover, we quantitatively estimate the relationship between alignment quality and overall G2P system performance. We find that, in English, bigram alignment models do perform better than unigram alignment models on the G2P task. Moreover, we find that supervised alignment techniques may perform considerably better than their unsupervised brethren and that few manually aligned training pairs suffice for them to do so. Finally, we estimate a highly significant impact of alignment quality on overall G2P transcription performance and that this relationship is linear in nature.
- Sentence Compression by Deletion with LSTMs Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, Lukasz Kaiser and Oriol Vinyals
We present an LSTM approach to deletion-based sentence compression where the task is to translate a sentence into a sequence of zeros and ones, corresponding to token deletion decisions. We demonstrate that even the most basic version of the system, which is given no syntactic information (no PoS or NE tags, or dependencies) or desired compression length, performs surprisingly well: around 30% of the compressions from a large test set could be regenerated. We compare the LSTM system with a competitive baseline which is trained on the same amount of data but is additionally provided with all kinds of linguistic features. In an experiment with human raters the LSTM-based model outperforms the baseline achieving 4.5 in readability and 3.8 in informativeness.
- CORE: Context-Aware Open Relation Extraction with Factorization Machines Fabio Petroni, Luciano Del Corro and Rainer Gemulla
We propose CORE, a novel matrix factorization model that leverages contextual information for open relation extraction. Our model is based on factorization machines and integrates facts from various sources, such as knowledge bases or open information extractors, as well as the context in which these facts have been observed. We argue that integrating contextual information---such as metadata about extraction sources, lexical context, or type information---significantly improves prediction performance. Open information extractors, for example, may produce extractions that are unspecific or ambiguous when taken out of context. Our experimental study on a large real-world dataset indicates that CORE has significantly better prediction performance than state-of-the-art approaches when contextual information is available.
- Identifying Political Sentiment between Nation States with Social Media Nathanael Chambers
This paper describes a new model and application of sentiment analysis for the social sciences. The goal is to model relations between nation states with social media. Many cross-disciplinary applications of NLP involve making predictions (such as predicting political elections), but this paper instead focuses on a model that is applicable to political science analysis. Do citizens express opinions in line with their home country's formal relations? When opinions diverge over time, what is the cause and can social media serve to detect these changes? We propose several learning algorithms to study how the populace of a country discusses foreign nations on Twitter, ranging from bootstrap learning of irrelevant tweets to state-of-the-art contextual sentiment analysis. We evaluate on standard sentiment evaluations, but we also show strong correlations with two public opinion polls and current international alliance relationships. We conclude with some political science use cases.
- Language and Domain Independent Entity Linking with Quantified Collective Validation Han Wang, Jin Guang Zheng, Xiaogang Ma, Peter Fox and Heng Ji
Linking named mentions detected in a source document to an existing knowledge base provides disambiguated entity referents for the mentions. This allows better document analysis, knowledge extraction and knowledge base population. Most of the previous research extensively exploited the linguistic features of the source documents in a supervised or semi-supervised way. These systems therefore cannot be easily applied to a new language or domain. In this paper, we present a novel unsupervised algorithm named Quantified Collective Validation that avoids excessive linguistic analysis on the source documents and fully leverages the knowledge base structure for the entity linking task. We show our approach achieves state-of-the-art English entity linking performance and demonstrate successful deployment in new languages (Chinese) and new domains (Biomedical and Earth Science).
- Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation Michael Pust, Ulf Hermjakob, Kevin Knight, Daniel Marcu and Jonathan May
We present a parser for Abstract Meaning Representation (AMR). We treat English-to-AMR conversion within the framework of string-to-tree, syntax-based machine translation (SBMT). To make this work, we transform the AMR structure into a form suitable for the mechanics of SBMT and useful for modeling. We introduce an AMR-specific language model and add data and features drawn from semantic resources. Our resulting AMR parser significantly improves upon state-of-the-art results.
- A Utility Model of Authors in the Scientific Community Yanchuan Sim, Bryan Routledge and Noah A. Smith
Authoring a scientific paper is a complex process involving many decisions. We introduce a probabilistic model of some of the important aspects of that process: that authors have individual preferences, that writing a paper requires trading off among the preferences of authors as well as extrinsic rewards in the form of community response to their papers, that preferences (of individuals and the community) and tradeoffs vary over time. Variants of our model lead to improved predictive accuracy of citations given texts and texts given authors. Further, our model's posterior suggests an interesting relationship between seniority and author choices.
- Improved Arabic Dialect Classification on Social Media Data Fei Huang
Arabic dialect classification has been an important and challenging problem for Arabic language processing, especially for social media text analysis and machine translation. In this paper we propose an approach to improving Arabic dialect classification with semi-supervised learning: multiple classifiers are trained with weakly supervised, strongly supervised, and unsupervised data. Their combination yields significant and consistent improvement on two different test sets. The dialect classification accuracy is improved by 5% over the strongly supervised classifier and 20% over the weakly supervised classifier. Furthermore, when applying the improved dialect classifier to build a Modern Standard Arabic (MSA) language model (LM), the new model size is reduced by 70% while the English-Arabic translation quality is improved by 0.6 Bleu point.
- Modeling Relation Paths for Representation Learning of Knowledge Bases Yankai Lin, Zhiyuan Liu and Maosong Sun
Representation learning of knowledge bases (KBs) aims to embed both entities and relations into a low-dimensional space. Most existing methods only consider direct relations in representation learning. We argue that multiple-step relation paths also contain rich inference patterns between entities, and propose a path-based representation learning model. This model considers relation paths as translations between entities for representation learning, and addresses two key challenges: (1) Since not all relation paths are reliable, we design a path-constraint resource allocation algorithm to measure the reliability of relation paths. (2) We represent relation paths via semantic composition of relation embeddings. Experimental results on real-world datasets show that, as compared with baselines, our model achieves significant and consistent improvements on knowledge base completion and relation extraction from text.
- Phrase-based Compressive Cross-Language Summarization Jin-ge Yao, Xiaojun Wan and Jianguo Xiao
The task of cross-language document summarization is to create a summary in a target language from documents in a different source language. Previous methods only involve direct extraction of automatically translated sentences from the original documents. In this work we propose a phrase-based model to simultaneously perform sentence scoring, extraction and compression. We design a greedy algorithm to approximately optimize the score function. Experimental results show that our methods outperform the state-of-the-art extractive systems while maintaining similar grammatical quality.
- An Empirical Comparison Between N-gram and Syntactic Language Models for Word Ordering Jiangming Liu and Yue Zhang
Syntactic language models and N-gram language models have both been used in word ordering. In this paper, we give an empirical comparison between N-gram and syntactic language models on word order task. Our results show that the quality of automatically-parsed training data has a relatively small impact on syntactic models. Both of syntactic and N-gram models can benefit from large-scale raw text. Compared with N-gram models, syntactic models give overall better performance, but they require much more training time. In addition, the two models lead to different error distributions in word ordering. A combination of the two models integrates the advantages of each model, achieving the best result in a standard benchmark.
- Multilingual discriminative lexicalized phrase structure parsing Benoit Crabbé
We provide a generalization of discriminative lexicalized shift reduce parsing techniques for phrase structure grammar to a wide range of morphologically rich languages. The model is efficient and outperforms recent strong baselines on almost all languages considered. It takes advantage of a dependency based modelling of morphology and a shallow modelling of constituency boundaries.
- All the Right Reasons: Semi-supervised Argumentation Mining in User-generated Web Discourse Ivan Habernal and Iryna Gurevych
Analyzing arguments in user-generated Web discourse has recently gained attention in argumentation mining, an evolving field of NLP. Current approaches, which employ fully-supervised machine learning, are usually domain dependent and suffer from the lack of large and diverse annotated corpora. However, annotating arguments in discourse is costly, error-prone, and highly context-dependent. We asked whether leveraging unlabeled data in a semi-supervised manner can boost the performance of argument component identification and to which extent is the approach independent of domain and register. We propose novel features that exploit clustering of unlabeled data from debate portals based on a word embeddings representation. Using these features, we significantly outperform several strong baselines in the cross-validation, cross-domain, and cross-register evaluation scenarios.
- Learning Semantic Representations for Nonterminals in Hierarchical Phrase-Based Translation Xing Wang and Deyi Xiong
In hierarchical phrase-based translation, coarse-grained nonterminal Xs may generate inappropriate translations due to the lack of sufficient information for phrasal substitution. In this paper we propose a framework to refine nonterminals in hierarchical translation rules with realvalued semantic representations. The semantic representations are learned via a weighted mean value and a minimum distance method using phrase vector representations obtained from large scale monolingual corpus. Based on the learned semantic vectors, we build a semantic nonterminal refinement model to measure semantic similarities between phrasal substitutions and nonterminal Xs in translation rules. Experiment results on Chinese-English translation show that the proposed model significantly improves translation quality on NIST test sets.
- Topic Identification and Discovery on Text and Speech Chandler May, Francis Ferraro, Alan McCree, Jonathan Wintrode, Daniel Garcia-Romero and Benjamin Van Durme
We compare the multinomial i-vector framework from the speech community with LDA, SAGE, and LSA as feature learners for topic ID on multinomial text and speech data. We also compare the learned representations in their ability to discover topics, quantified by distributional similarity to gold-standard topics and by human interpretability. We find that topic ID and topic discovery are competing objectives. We argue that LSA and i-vectors should be more widely considered by the text processing community as pre-processing steps for downstream tasks, and also speculate about speech processing tasks that could benefit from more interpretable representations like SAGE.
- Sentiment Flow - A General Model of Web Review Argumentation Henning Wachsmuth, Johannes Kiesel and Benno Stein
Web reviews have been intensively studied in argumentation-related tasks such as sentiment analysis. However, due to their focus on content-based features, many sentiment analysis approaches are effective only for reviews from those domains they have been specifically modeled for. This paper puts its focus on domain independence and asks whether a general model can be found for how people argue in web reviews. Our hypothesis is that people express their global sentiment on a topic with similar sequences of local sentiment independent of the domain. We model such sentiment flow robustly under uncertainty through abstraction. To test our hypothesis, we predict global sentiment based on sentiment flow. In systematic experiments, we improve over the domain independence of strong baselines. Our findings suggest that sentiment flow qualifies as a general model of web review argumentation.
- Better Document-level Sentiment Analysis from RST Discourse Parsing Parminder Bhatia, Yangfeng Ji and Jacob Eisenstein
Discourse structure has long been thought to have the potential improve the prediction of document-level labels, such as sentiment polarity. We present successful applications of Rhetorical Structure Theory (RST) to document-level sentiment analysis, via composition of local information up the discourse tree. First, we show that RST offers substantial improvements in lexicon-based sentiment analysis, via a reweighting of discourse units according to their position in a dependency representation of the rhetorical structure. Next, we present a recursive neural network over the RST structure, which offers significant improvements over classification-based sentiment polarity analysis.
- Language Understanding for Text-based Games Using Deep Reinforcement Learning Karthik Narasimhan, Tejas Kulkarni and Regina Barzilay
In this paper, we consider the task of learning control policies for text-based games. In these games, all interactions in the virtual world are through text and the underlying state is not observed. The resulting language barrier makes such environments challenging for automatic game players. We employ a deep reinforcement learning framework to jointly learn state representations and action policies using game rewards as feedback. This framework enables us to map text descriptions into vector representations that capture the semantics of the game states. We evaluate our approach on two game worlds, comparing against a baseline with a bag-of-words state representation. Our algorithm outperforms the baseline on quest completion by 54% on a newly created world and by 14% on a pre-existing fantasy game.
- Open Extraction of Fine-Grained Political Opinion David Bamman and Noah A. Smith
Text data has recently been used as evidence in estimating the political ideologies of individuals, including political elites and social media users. While inferences about people are often the intrinsic quantity of interest, we draw inspiration from open information extraction to identify a new task: inferring the political import of propositions like "Obama is a Socialist." We present several models that exploit the structure that exists between people and the assertions they make to learn latent positions of people and propositions at the same time, and we evaluate them on a novel dataset of propositions judged on a political spectrum.
- A Transition-based Model for Joint Segmentation, POS-tagging and Normalization Tao Qian, Yue Zhang, Meishan Zhang and Donghong JI
Two central challenges of text normalization on Chinese Microtext are the error propagation from word segmentation and the lack of annotated corpora. Inspired by the joint model of word segmentation and POS tagging, we propose a transition-based joint model of word segmentation, POS tagging and text normalization. The model can be trained on standard text corpora, overcoming the lack of annotated Microtext corpora. To evaluate our model, we develop an annotated corpus based on Microtext. Experimental results show that our joint model can help improve the performance of word segmentation on Microtext, giving an error reduction in segmentation accuracy of 22.49%, compared to the traditional approach.
- Feature-Rich Two-Stage Logistic Regression for Monolingual Alignment Md Arafat Sultan, Steven Bethard and Tamara Sumner
Monolingual alignment is the task of pairing semantically similar units from two pieces of text. We report a top-performing supervised aligner that operates on short text snippets. We employ a large feature set to (1) encode similarities among semantic units (words and named entities) in context, and (2) address cooperation and competition for alignment among units in the same snippet. These features are deployed in a two-stage logistic regression framework for alignment. On two benchmark data sets, our aligner achieves F1 scores of 92.1% and 88.5%, with statistically significant error reductions of 4.8% and 7.3% over the previous best aligner. It produces top results in extrinsic evaluation as well.
- Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model Hidetaka Kamigaito, Taro Watanabe, Hiroya Takamura, Manabu Okumura and Eiichiro Sumita
In hierarchical phrase-based machine translation, a rule table is automatically learned by heuristically extracting synchronous rules from a parallel corpus. As a result, spuriously many rules are extracted which may be composed of various incorrect rules. The larger rule table incurs more run time for decoding and may result in lower translation quality. To resolve the problems, we propose a hierarchical back-off model for Hiero grammar, an instance of a synchronous context free grammar (SCFG), on the basis of the hierarchical Pitman-Yor process. The model can extract a compact rule and phrase table without resorting to any heuristics by hierarchically backing off to smaller phrases under SCFG. Inference is efficiently carried out using two-step synchronous parsing of Xiao et al., (2012) combined with slice sampling. In our experiments, the proposed model achieved higher translation quality than a previous Bayesian model measured using BLEU on various language pairs; Germany/French/Spanish/Japanese-to-English.
- Input Method Logs as Natural Annotations for Word Segmentation Fumihiko Takahasi and Shinsuke Mori
In this paper we propose a framework to improve word segmentation accuracy using input method logs. An input method is software used to type sentences in languages which have far more characters than the number of keys on a keyboard. The main contributions of this paper are: 1) an input method server that proposes word candidates which are not included in the vocabulary, 2) a publicly usable input method that logs user behavior (like typing and selection of word candidates), and 3) a method for improving word segmentation by using these logs. We conducted word segmentation experiments on tweets from Twitter, and showed that our method improves accuracy in this domain. Our method itself is domain-independent and only needs logs from the target domain.
- A Neural Attention Model for Abstractive Sentence Summarization Sumit Chopra, Jason Weston and Alexander M. Rush
Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method is based on a local attention-based model that generates each word of the summary conditioned on the input sentence. Unlike many abstractive approaches it does not rely on any text preprocessing steps. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The model shows significant performance gains on the DUC-2004 shared task compared with several strong baselines.
- Stochastic Top-k ListNet Tianyi Luo, Dong Wang, Rong Liu and Yiqiao Pan
ListNet is a well-known listwise learning to rank model and has gained much attention in recent years. A particular problem of ListNet, however, is the high computation complexity in model training, mainly due to the large number of object permutations involved in computing the gradients. This paper proposes a stochastic ListNet approach which computes the gradient within a bounded permutation subset. It significantly reduces the computation complexity of model training and allows extension to Top-k models, which is impossible with the conventional implementation based on full-set permutations. Meanwhile, the new approach utilizes partial ranking information of human labels, which helps improve model quality. Our experiments demonstrated that the stochastic ListNet method indeed leads to better ranking performance and speeds up the model training remarkably.
- Joint Mention Extraction and Classification with Mention Hypergraphs Wei Lu and Dan Roth
We present a novel model for the task of joint mention extraction and classification. Unlike existing approaches, our model is able to effectively capture overlapping mentions whose lengths are unbounded. Our model is highly scalable, with a time complexity that is linear in the number of words in the input sentence and linear in the number of possible mention classes. The model can be extended to additionally capture mention heads explicitly in a joint manner under the same time complexity. We demonstrate the effectiveness of our model through extensive experiments on standard datasets.
- Intra-sentential Zero Anaphora Resolution using Subject Sharing Recognition Ryu Iida, Kentaro Torisawa, Chikara Hashimoto, Jong-Hoon Oh and Julien Kloetzer
In this work, we improve the performance of intra-sentential zero anaphora resolution in Japanese using a novel method of recognizing subject sharing relations. In Japanese, a large portion of intra-sentential zero anaphora can be regarded as subject sharing relations between predicates, that is, the subject of some predicate is also the unrealized subject of other predicates. We develop a highly accurate recognizer of subject sharing relations for pairs of predicates in a single sentence, and then construct a subject shared predicate network, which is a set of predicates that are linked by the subject sharing relations recognized by our recognizer. We finally combine our zero anaphora resolution method exploiting the subject shared predicate network and a state-of-the-art ILP-based zero anaphora resolution method. Our combined method achieved significantly better F-score than the ILP-based method alone on intra-sentential zero anaphora resolution in Japanese. To the best of our knowledge, this is the first work to explicitly use an independent subject sharing recognizer in zero anaphora resolution.
- Graph-Based Collective Lexical Selection for Statistical Machine Translation jinsong su, Deyi Xiong, Xianpei Han and Junfeng Yao
Lexical selection is of great importance to statistical machine translation. In this paper, we propose a graph-based framework for collective lexical selection. The framework is established on a translation graph that captures not only local associations between source-side content words and their target translations but also target-side global dependencies in terms of relatedness among target items. We also introduce a random walk style algorithm to collectively identify translations of source-side content words that are strongly related in translation graph. We validate the effectiveness of our lexical selection framework on Chinese-English translation. Experiment results with large-scale training data show that our approach significantly improves lexical selection.
- Corpus-level Fine-grained Entity Typing using Contextual Information Yadollah Yaghoobzadeh and Hinrich Schütze
We address the problem of fine-grained corpus-level entity typing, i.e., inferring from a large corpus that an entity is a member of a class such as ``food'' or ``artist''. In contrast to prior work that has focused on clean data and occurrences of entities in a limited set of contexts, we develop FIGMENT, an embedding-based entity typer that works well on noisy text and considers all contexts of the entity. We compare a global model that does typing based on aggregate corpus information and a context model that analyzes contexts individually, and find that their combination gives the best results.
- Hierarchical Low-Rank Tensors for Multilingual Transfer Parsing Yuan Zhang and Regina Barzilay
Accurate multilingual transfer parsing typically relies on careful feature engineering. In this paper, we propose a hierarchical tensor-based approach for this task. This approach induces a compact feature representation by combining atomic features. However, unlike traditional tensor models, it enables us to incorporate prior knowledge about desired feature interactions, eliminating spurious feature combinations. To this end, we use a hierarchical structure that uses intermediate embeddings to capture desired feature combinations. From the algebraic view, this hierarchical tensor is equivalent to the sum of traditional tensors with shared components, and thus can be effectively trained with standard online algorithms. In both unsupervised and semi-supervised transfer scenarios, our hierarchical tensor consistently improves UAS and LAS over state-of-the-art multilingual transfer parsers and the traditional tensor model across 10 different languages.
- Discourse parsing for multi-party chat dialogues Eric Kow, Stergos Afantenos, Nicholas Asher and Jérémy Perret
In this paper we present the first ever, to the best of our knowledge, discourse parser for multi-party chat dialogues. Discourse in multi-party dialogues dramatically differs from monologues since threaded conversations are commonplace rendering prediction of the discourse structure compelling. Moreover, the fact that our data come from chats renders the use of syntactic and lexical information useless since people take great liberties in expressing themselves lexically and syntactically. We use the dependency parsing paradigm as has been done in the past (Muller et al., 2012; Li et al., 2014). We learn local probability distributions and then use MST for decoding. We achieve 0.680 F1 on unlabelled structures and 0.516 F_1 on fully labeled structures which is better than many state of the art systems for monologues, despite the inherent difficulties that multi-party chat dialogues have.
- A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences Andreas Guta, Tamer Alkhouli, Jan-Thorsten Peter, Joern Wuebker and Hermann Ney
We propose a conversion of bilingual sentence pairs and the corresponding word alignments into novel linear sequences. These are joint translation and reordering (JTR) uniquely defined sequences, combining interdepending lexical and alignment dependencies on the word level into a single framework. They are constructed in a simple manner while capturing multiple alignments and empty words. JTR sequences can be used to train a variety of models. We investigate the performances of n-gram models with modified Kneser-Ney smoothing, feed-forward and recurrent neural network architectures when estimated on JTR sequences, and compare them to the operation sequence model (Durrani et al., 2013). Evaluations on the IWSLT German-English, WMT German-English and BOLT Chinese-English tasks show that JTR models improve state-of-the-art phrase-based systems by up to +2.2 BLEU.
- Diversity in Spectral Learning for Natural Language Parsing Shashi Narayan and Shay B. Cohen
We describe an approach to incorporate diversity into spectral learning of latent-variable PCFGs (L-PCFGs). Our approach works by creating multiple spectral models where noise is added to the underlying features in the training set before the estimation of each model. We describe three ways to decode with multiple models. In addition, we describe a simple variant of the spectral algorithm for L-PCFGs that is fast and leads to compact models. Our experiments for natural language parsing, for English and German, show that we get a significant improvement over baselines comparable to state of the art. For English, we achieve the F1 score of 90.18, and for German we achieve the F1 score of 83.38.
- Dependency Graph-to-String Translation Liangyou Li, Andy Way and Qun Liu
Compared to trees, graphs are more powerful to represent natural languages. The corresponding graph grammars have stronger generative capacity over structures than tree grammars as well. Based on edge replacement grammar, in this paper we propose to use a synchronous graph-to-string grammar for statistical machine translation. The graph we use is directly converted from a dependency tree. We build our translation model in the log-linear framework with 9 standard features. Large-scale experiments on ChineseÐEnglish and GermanÐEnglish tasks show that our model is significantly better than the state-of-the-art hierarchical phrase-based (HPB) model and a recent dependency tree-to-string model on BLEU, METEOR and TER scores. Experiments also suggest that our model has better ability of long-distance reordering and is more suitable for translating long sentences.
- Knowledge Base Unification via Sense-based Embeddings and Disambiguation Claudio Delli Bovi, Luis Espinosa Anke and Roberto Navigli
We present a novel approach for integrating the output of many different Open Information Extraction systems into a single unified and fully disambiguated knowledge repository. Our approach consists of three main steps: (1) disambiguation of relation argument pairs via a semantically-enhanced vector space model and a large unified sense inventory; (2) ranking of semantic relations according to their degree of specificity; (3) cross-resource relation alignment and merging based on the semantic similarity of relation domains and ranges. We tested our approach on a set of four heterogeneous knowledge bases, obtaining high-quality results.
- Semantic Role Labeling with Neural Network Factors Nicholas FitzGerald, Oscar Täckström, Kuzman Ganchev and Dipanjan Das
We present a new method for semantic role labeling in which arguments and semantic roles are jointly embedded in a shared vector space for a given predicate. These embeddings belong to a neural network, whose output represents the potential functions of a graphical model designed for the SRL task. We consider both local and structured learning methods and obtain state-of-the-art results on standard PropBank and FrameNet corpora with a straightforward product-of-experts model. We further show how the model can learn jointly from PropBank and FrameNet annotations to obtain additional improvements on the smaller FrameNet dataset.
- Hierarchical Recurrent Neural Network for Document Modeling Rui Lin, Shujie Liu, Muyun Yang, Mu Li, Ming Zhou and Sheng Li
This paper proposes a novel hierarchical recurrent neural network language model (HRNNLM) for document modeling. After establishing a RNN to capture the coherence between sentences in a document, HRNNLM integrates it as the sentence history information into the word level RNN to predict the word sequence with cross-sentence contextual information. A two-step training approach is designed, in which sentence-level and word-level language models are approximated for the convergence in a pipeline style. Examined by the standard sentence ordering scenario, HRNNLM is proved for its better accuracy in modeling the sentence coherence. And at the word level, experimental results also indicate a significant lower model perplexity, followed by a practical better translation result when applied to a Chinese-English document translation reranking task.
- Evaluation methods for unsupervised word embeddings Igor Labutov, David Mimno and Thorsten Joachims
We present a comprehensive study of evaluation methods for unsupervised embedding techniques that obtain meaningful representations of words from text. Different evaluations result in different orderings of embedding methods, calling into question the common assumption that there is one single optimal vector representation. We present new evaluation techniques that directly compare embeddings with respect to specific queries. These methods reduce bias, provide greater insight, and allow us to solicit data-driven relevance judgments rapidly and accurately through crowdsourcing.
- Confounds and Consequences in Geotagged Twitter Data Umashanthi Pavalanathan and Jacob Eisenstein
Twitter is often used in quantitative studies that identify geographically-preferred topics, writing styles, and entities. These studies rely on either GPS coordinates attached to individual messages, or on the user-supplied location field in each profile. In this paper, we compare these data acquisition techniques and quantify the biases that they introduce; we also measure their effects on linguistic analysis and text-based geolocation. GPS-tagging and self-reported locations yield measurably different corpora, and these linguistic differences are partially attributable to differences in dataset composition by age and gender. Using a latent variable model to induce age and gender, we show how these demographic variables interact with geography to affect language use. We also show that the accuracy of text-based geolocation varies with population demographics, giving the best results for men above the age of 40.
- Joint A* CCG Parsing and Semantic Role Labelling Mike Lewis, Luheng He and Luke Zettlemoyer
Joint models of syntactic and semantic parsing have the potential to improve performance on both tasks---but to date, the best results have been achieved with pipelines. We introduce a joint model using CCG, which is motivated by the close link between CCG syntax and semantics. Semantic roles are recovered by labelling the deep dependency structures produced by the grammar. Furthermore, because CCG is lexicalized, we show it is possible to factor the parsing model over words and introduce a new A* parsing algorithm---which we demonstrate is faster and more accurate than adaptive supertagging. Our joint model is the first to substantially improve both syntactic and semantic accuracy over a comparable pipeline, and also achieves state-of-the-art results for a non-ensemble semantic role labelling model.
- Long Short-Term Memory Neural Networks for Chinese Word Segmentation Xinchi Chen, Xipeng Qiu and Xuanjing Huang
Currently most of state-of-the-art methods for Chinese word segmentation are based on supervised learning, whose features are mostly extracted from a local context. These methods cannot utilize the long distance information which is also crucial for word segmentation. In this paper, we propose a novel neural network model for Chinese word segmentation, which adopts the long short-term memory (LSTM) neural network to keep the previous important information in memory cell and avoids the limit of window size of local context. Experiments on PKU, MSRA and CTB6 benchmark datasets show that our model outperforms the previous neural network models and state-of-the-art methods.
- Transition-based Dependency Parsing Using Two Heterogeneous Gated Recursive Neural Networks Xinchi Chen, Xipeng Qiu and Xuanjing Huang
Recently, neural network based dependency parsing has attracted much interest, which can effectively alleviate the problems of data sparsity and feature engineering by using the dense features. However, it is still a challenge problem to sufficiently model the complicated syntactic and semantic compositions of the dense features in neural network based methods. In this paper, we propose two heterogeneous gated recursive neural networks: tree structured gated recursive neural network (Tree-GRNN) and directed acyclic graph structured gated recursive neural network (DAG-GRNN). Then we integrate them to automatically learn the compositions of the dense features for transition-based dependency parsing. Specifically, Tree-GRNN models the feature combinations for the trees in stack, which already have partial dependency structures. DAG-GRNN models the feature combinations of the nodes whose dependency relations have not been built yet. Experiment results on two prevalent benchmark datasets (PTB3 and CTB5) show the effectiveness of our proposed model.
- Efficient Algorithm for Incorporating Knowledge into Topic Models Yi Yang, Doug Downey and Jordan Boyd-Graber
Latent Dirichlet allocation (LDA) is a popular topic modeling technique for exploring hidden topics in text corpora. Increasingly, topic modeling is trying to scale to larger topic spaces, and utilize richer forms of prior knowledge, such as word correlations or document labels. However, inference is cumbersome for LDA models with prior knowledge. As a result, LDA models that use prior knowledge only work in small-scale scenarios. In this work, we propose a factor graph framework, Sparse Constrained LDA (SC-LDA), for efficiently incorporating prior knowledge into LDA. In experiments, we evaluate SC-LDA's ability to incorporate word correlation knowledge and document label knowledge on three benchmark datasets. Compared to several baseline methods, SC-LDA achieves comparable performance but runs significantly faster.
- Open-Domain Name Error Detection using a Multi-Task RNN Hao Cheng, Hao Fang and Mari Ostendorf
Out-of-vocabulary name errors in speech recognition create significant problems for downstream language processing, but the fact that they are rare poses challenges for automatic detection, particularly in an open-domain scenario. To address this problem, a multi-task recurrent neural network language model for sentence-level name detection is proposed for use in combination with out-of-vocabulary word detection. The sentence-level model is also effective for leveraging external text data. Experiments show a 26% improvement in name-error detection F-score.
- Detecting Information-Heavy Sentences: A Cross-Language Case Study Junyi Jessy Li and Ani Nenkova
Some sentences, even if they are grammatical, contain too much information and the content they convey would be more accessible to a reader if expressed in multiple sentences. We call such sentences information heavy. In this paper we introduce the task of detecting information-heavy sentences in cross-lingual context. Specifically we develop methods to identify sentences in Chinese for which English speakers would prefer translations consisting of more than one sentence. We base our analysis and definitions on evidence from multiple human translations and reader preferences on flow and understandability. We show that machine translation quality when translating information heavy sentences is markedly worse than overall quality and that this type of sentence are fairly common in Chinese news. We demonstrate that sentence length and punctuation usage in Chinese are not sufficient clues for accurately detecting heavy sentences and present a richer classification model that accurately identifies these sentences.
- Distributional vectors encode referential attributes Abhijeet Gupta, Gemma Boleda, Marco Baroni and Sebastian Padó
Distributional methods have proven to excel at capturing fuzzy, graded aspects of meaning (Italy is more similar to Spain than to Germany). In contrast, it is difficult to extract the values of more specific attributes of word referents from distributional representations, attributes of the kind typically found in structured knowledge bases (Italy has 60 million inhabitants). In this paper, we pursue the hypothesis that distributional vectors also implicitly encode referential attributes. We show that a standard supervised regression model is in fact sufficient to retrieve such attributes to a reasonable degree of accuracy: When evaluated on the prediction of both categorical and numeric attributes of countries and cities as stored in a structured knowledge base, the model consistently reduces baseline error by 30%, and is not far from the upper bound. Further analysis provides qualitative insight into the task, such as which types of attributes are harder to learn from distributional information.
- Modeling Reportable Events as Turning Points in Narrative Jessica Ouyang and Kathy McKeown
We present novel experiments in modeling the rise and fall of story characteristics within narrative, leading up to the Most Reportable Event (MRE), the compelling event that is the nucleus of the story. We construct a corpus of personal narratives from the bulletin board website Reddit, using the organization of Reddit content into topic-specific communities to automatically identify narratives. Leveraging the structure of Reddit comment threads, we automatically label a large dataset of narratives. We present a change-based model of narrative that tracks changes in formality, affect, and other characteristics over the course of a story, and we use this model in distant supervision and self-training experiments that achieve significant improvements over the baselines at the task of identifying MREs.
- A Computational Cognitive Model of Novel Word Generalization Aida Nematzadeh, Erin Grant and Suzanne Stevenson
A key challenge in vocabulary acquisition is learning which of the many possible meanings is appropriate for a word. The word generalization problem refers to how children associate a word such as dog with a meaning at the appropriate category level in the taxonomy of objects, such as Dalmatians, dogs, or animals. We present the first computational study of word generalization integrated within a word learning model. The model simulates child and adult patterns of word generalization in a word-learning task. These patterns arise due to the interaction of type and token frequencies in the input data, an influence often observed in people's generalization of linguistic categories.
- Learning Semantic Composition to Detect Non-compositionality of Multiword Expressions Majid Yazdani, Meghdad Farahmand and James Henderson
Non-compositionality of multi word expressions is an intriguing problem that can be the source of error in a variety of NLP tasks such as language generation, machine translation and word sense disambiguation. In this work we present a method of detecting non-compositional English noun compounds by learning a composition function. We explore a range of possible models for semantic composition, empirically evaluate these models and propose an improvement method over the most accurate ones. We show that a complex function such as polynomial projection can learn semantic composition and identify non-compositionality in an unsupervised way, beating all other baselines ranging from simple to complex. We show further improvements by also training a decomposition function, and with a form of EM algorithm over latent compositionality annotations.
- Do You See What I Mean? Visual Resolution of Linguistic Ambiguities Yevgeni Berzak, Andrei Barbu, Daniel Harari, Boris Katz and Shimon Ullman
Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types. Potential applications of this task include video retrieval, where capturing different meanings of a sentential query can be vital for obtaining good results.
- Reordering Context-Free Grammar Induction Miloš Stanojević and Khalil Sima'an
We present a novel approach for unsupervised induction of a Reordering Grammar using a modified form of permutation trees (Zhang and Gildea, 2007), which we apply to preordering in phrase-based machine translation. Unlike previous approaches, we induce in one step both the hierarchical structure and the transduction function over it from word-aligned parallel corpora. Furthermore, our model (1) handles non-ITG reordering patterns (up to 5-ary branching), 2) is learned from all derivations by treating not only labeling but also bracketing as latent variable, (3) is entirely unlexicalized at the level of reordering rules, and (4) requires no linguistic annotation. Our model is evaluated both for accuracy in predicting target order, and for it's impact on translation quality. We report significant performance gains over phrase reordering, and over two known preordering baselines for English-Japanese.
- Building a shared world: mapping distributional to model-theoretic semantic spaces Aurélie Herbelot and Eva Maria Vecchi
In this paper, we introduce an approach to automatically map a standard distributional semantic space onto a set-theoretic model. We predict that there is a functional relationship between distributional information and vectorial concept representations in which dimensions are predicates and weights are generalised quantifiers. In order to test our prediction, we learn a model of such relationship over a publicly available dataset of feature norms annotated with natural language quantifiers. Our initial experimental results show that, at least for domain-specific data, we can indeed map between formalisms, and generate high-quality vector representations which correspond to generalised quantifiers in a set-theoretic model. We further investigate the generation of natural language quantifiers from such vectors.
- Conversation Trees: A Grammar Model for Topic Structure in Forums Annie Louis and Shay B. Cohen
Online forum discussions proceed differently from face-to-face conversations and any single thread on a forum contains posts on different subtopics. This work aims to characterize the content of a forum thread as a 'conversation tree' of topics. We present models that jointly perform two tasks: segment a thread into sub-parts, and assign a topic to each part. The core idea of our work is a definition of topic structure using probabilistic grammars. By leveraging the flexibility of two grammar formalisms, Context-Free Grammars and Linear Context-Free Rewriting Systems, our models create desirable structures for forum threads: our topic segmentation is hierarchical, links non-adjacent segments on the same topic, and jointly labels the topic during segmentation. We show that our models outperform three tree generation baselines.
- RELLY: Inferring Hypernym Relationships Between Relational Phrases Adam Grycner, Gerhard Weikum, Jay Pujara, James Foulds and Lise Getoor
Relational phrases (e.g., "got married to") and their hypernyms (e.g., "is a relative of") are central for many tasks including question answering, open information extraction, paraphrasing, and entailment detection. This has motivated the development of linguistic resources such as DIRT (Lin and Pantel, 2001), PATTY (Nakashole et al., 2012), and WiseNet (Moro and Navigli, 2012), which systematically collect and organize relational phrases. These resources have demonstrable practical benefits, but are each limited due to noise, sparsity, or size. We present a new general-purpose method, RELLY, for constructing a large hypernymy graph of relational phrases with high-quality subsumptions. Our graph induction approach integrates small high-precision knowledge bases together with larger automatically curated resources, and reasons collectively to combine these resources into a consistent graph, using a recently developed probabilistic programming language called probabilistic soft logic (PSL) (Bach et al., 2015). We use RELLY to construct a hypernymy graph consisting of 20K relational phrases with 35K hypernymy links. We extensively evaluate our hypernymy graph both intrinsically and extrinsically. Our evaluation indicates a hypernymy link precision of 78%, and demonstrates the value of this resource for a document-relevance ranking task.
- Extracting Relations between Non-Standard Entities using Distant Supervision and Imitation Learning Isabelle Augenstein, Andreas Vlachos and Diana Maynard
Distantly supervised approaches have become popular in recent years as they allow training relation extractors without text-bound annotation, using instead known relations from a knowledge base and a large textual corpus from an appropriate domain. While state of the art distant supervision approaches use off-the-shelf named entity recognition (NER) systems to identify relation arguments, discrepancies in domain or genre between the data used for NER training and the intended domain for the relation extractor can lead to low performance. This is particularly problematic for "non-standard" named entities such as album which would fall into the MISC category. We propose to ameliorate this issue by jointly training the named entity classifier and the relation extractor using imitation learning which reduces structured prediction learning to classification learning. We further experiment with different features and compare against a baseline using off-the-shelf supervised NER system. Experiments show that our approach improves on the baseline for both "standard" and "non-standard" named entities by 19 points in average precision. Furthermore, we show that Web features such as links and lists increase average precision by 7 points.
- Learning natural language inference from a large annotated corpus Samuel R. Bowman, Gabor Angeli, Christopher Potts and Christopher D. Manning
Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce a new freely available corpus of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. We find that this increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and that it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
- Using Personal Traits For Brand Preference Prediction Chao Yang, Shimei Pan, Jalal U. Mahmud, Huahai Yang and Padmini Srinivasan
In this paper, we present the first comprehensive study of the relationship between a person's traits and his/her brand preferences. In our analysis, we included a large number of character traits such as personality, personal values and individual needs. These trait features were obtained from both a psychometric survey and automated social media analytics. We also included an extensive set of brand names from diverse product categories. From this analysis, we want to shed some light on (1) whether it is possible to use personal traits to infer an individual's brand preferences (2) whether the trait features automatically inferred from social media are good proxies for the ground truth character traits in brand preference prediction.
- Auto-Sizing Neural Networks: With Applications to n-gram Language Models Kenton Murray and David Chiang
Neural networks have been shown to improve performance across a range of natural-language tasks while addressing some issues with traditional models such as size. However, designing and training them can be complicated. Frequently, researchers resort to repeated experimentation across a range of parameters to pick optimal settings. In this paper, we address the issue of choosing the correct number of units in the hidden layers. We introduce a method for automatically adjusting network size by pruning out hidden units through $\ell_{\infty,1}$ and $\ell_{2,1}$ regularization. We apply this method to language modeling and demonstrate its ability to correctly choose the number of hidden units while maintaining perplexity. We also include these models in a machine translation decoder and show that these smaller neural models maintain the significant improvements of their unpruned versions.
- Improved Relation Extraction with Feature-Rich Compositional Embedding Models Matthew R. Gormley, Mo Yu and Mark Dredze
Compositional embedding models build a representation (or embedding) for a linguistic structure based on its component word embeddings. We propose a Feature-rich Compositional Embedding Model (FCM) for relation extraction that is expressive, generalizes to new domains, and is easy-to-implement. The key idea is to combine both (unlexicalized) hand-crafted features with learned word embeddings. The model is able to directly tackle the difficulties met by traditional compositional embeddings models, such as handling arbitrary types of sentence annotations and utilizing global information for composition. We test the proposed model on two relation extraction tasks, and demonstrate that our model outperforms both previous compositional models and traditional feature rich models on the ACE 2005 relation extraction task, and the SemEval 2010 relation classification task. The combination of our model and a log-linear classifier with hand-crafted features gives state-of-the-art results.
- Joint prediction in MST-style discourse parsing for argumentation mining Andreas Peldszus and Manfred Stede
We introduce a new approach to argumentation mining that we applied to a parallel German/English corpus of short texts annotated with argumentation structure. We focus on structure prediction, which we break into a number of subtasks: relation identification, central claim identification, role classification, and function classification. Our new model jointly predicts different aspects of the structure by combining the different subtask predictions in the edge weights of an evidence graph; we then apply a standard MST decoding algorithm. This model not only outperforms two reasonable baselines and two data-driven models of global argument structure for the difficult subtask of relation identification, but also improves the results for central claim identification and function classification and it compares favorably to a complex mstparser pipeline.
- Molding CNNs for text: non-linear, non-consecutive convolutions Tao Lei, Regina Barzilay and Tommi Jaakkola
The success of deep learning often derives from well-chosen operational building blocks. In this work, we revise the temporal convolution operation in CNNs to better adapt it to text processing. Instead of concatenating word representations, we appeal to tensor algebra and use low-rank n-gram tensors to directly exploit interactions between words already at the convolution stage. Moreover, we extend the n-gram convolution to non-consecutive words to recognize patterns with intervening words. Through a combination of low-rank tensors, and pattern weighting, we can efficiently evaluate the resulting convolution operation via dynamic programming. We test the resulting architecture on standard sentiment classification and news categorization tasks. Our model achieves state-of-the-art performance both in terms of accuracy and training speed among a variety of (neural network) models.
- A Dynamic Programming Algorithm for Computing N-gram Posteriors from Lattices Dogan Can and Shrikanth Narayanan
Efficient computation of n-gram posterior probabilities from lattices has applications in lattice-based minimum Bayes-risk decoding in statistical machine translation and the estimation of expected document frequencies from spoken corpora. In this paper, we present an algorithm for computing the posterior probabilities of all n-grams in a lattice and constructing a minimal deterministic weighted finite-state automaton associating each n-gram with its posterior for efficient storage and retrieval. Our algorithm builds upon the best known algorithm in literature for computing n-gram posteriors from lattices and leverages the following observations to significantly improve the time and space requirements: i) the n-grams for which the posteriors will be computed typically comprises all n-grams in the lattice up to a certain length, ii) posterior is equivalent to expected count for an n-gram that do not repeat on any path, iii) there are efficient algorithms for computing n-gram expected counts from lattices. We present experimental results comparing our algorithm with the best known algorithm in literature as well as a baseline algorithm based on weighted finite-state automata operations.
- Exploring Markov Logic Networks for Question Answering Tushar Khot, Niranjan Balasubramanian, Eric Gribkoff, Ashish Sabharwal, Peter Clark and Oren Etzioni
Our goal is to answer elementary-level science questions using knowledge extracted automatically from textbooks, expressed in a subset of first-order logic. Such knowledge is incomplete and noisy. Markov Logic Networks (MLNs) seem a natural model for expressing such knowledge, but the exact way of leveraging MLNs is by no means obvious. We investigate three ways of applying MLNs to our task. First, we simply use the extracted science rules directly as MLN clauses and exploit the structure present in hard constraints to improve tractability. Second, we interpret science rules as describing prototypical entities, resulting in a drastically simplified but brittle network. Our third approach, called Praline, uses MLNs to align lexical elements as well as define and control how inference should be performed in this task. Praline demonstrates a 15% accuracy boost and a 10x reduction in runtime as compared to other MLN-based methods, and comparable accuracy to word-based baseline approaches.
- Estimation of Discourse Segmentation Labels from Crowd Data Ziheng Huang, Jialu Zhong and Rebecca J. Passonneau
For annotation tasks involving independent items, probabilistic models have been used to infer ground truth labels from crowdsourcing, where many annotators independently label the same data. Such models have been shown to produce results superior to taking the majority vote as the ground truth. This paper presents a new dataset and new methods for sequential data where the labels are not independent. The data consists of crowd labels for annotation of discourse segment boundaries assigned to fifty recorded telephone conversations. To estimate ground truth labels, two approaches are presented that extend Hidden Markov Models to relax the independence assumption on observed data, based on the observation that segments tend be several utterances long. Results of the models are checked using metrics that test whether the same annotators maintain the same relative performance across different conversations.
- Semantic Framework for Comparison Structures in Natural Language Omid Bakhshandh and James Allen
Comparison is one of the most important phenomena in language for expressing objective and subjective facts about various entities. Systems that can understand and reason over comparatives can play a major role in the applications which require deeper understanding of language. In this paper we present a novel semantic framework for representing the meaning of comparison structures in natural language, which models comparisons as predicate-argument pairs inter-connected with semantic roles. Our framework supports not only adjectives, but also adverbial, nominal, and verbal comparatives. With this paper, we release a novel dataset of gold-standard comparison structures annotated according to our semantic framework.
- Towards the Extraction of Customer to Customer Suggestions in Reviews Sapna Negi and Paul Buitelaar
In this work, we target the automatic detection of suggestion expressing sentences in customer reviews. Such sentences mainly comprise of advice, recommendations and tips to the fellow customers, and sometimes suggestions for improvements to the manufacturers and providers as well. The scope of this work is limited to the former. Since this is a young problem, prior to the development of a solution, there is a need for a well formed problem definition and benchmark datasets. This work provides a 3 fold contribution, problem definition, benchmark dataset, and an approach for detection of suggestions to the customers. We identify two forms of suggestion expressions in reviews: implicit and explicit. We limit the scope of this work to the explicit ones. The problem is framed as a sentence classification problem and a set of linguistically motivated features are proposed in order to classify sentences as suggestion and non-suggestion sentences. Some interesting observations and analysis are also reported.
- Neural Networks for Open Domain Targeted Sentiment Meishan Zhang, Yue Zhang and Duy Tin Vo
Open domain targeted sentiment is the joint information extraction task that finds target mentions together with the sentiment towards each mention from a text corpus. The task is typically modeled as a sequence labeling problem, and solved using state-of-the-art labelers such as CRF. We empirically study the effect of word embeddings and automatic feature combinations on the task by extending a CRF baseline using neural networks, which have demonstrated large potentials for sentiment analysis. Results show that the neural model can give better results by significantly increasing the recall. In addition, we propose a novel integration of neural and discrete features, which combines their relative advantages, leading to significantly higher results compared to both baselines.
- Sarcastic or Not: Word-Embeddings to Predict the Literal or Sarcastic Meaning of Words Debanjan Ghosh, Weiwei Guo and Smaranda Muresan
Sarcasm is generally characterized as a figure of speech that involves the substitution of a literal by a figurative meaning, which is usually the opposite of the original literal meaning. We re-frame the sarcasm detection task as a word-sense disambiguation problem, where the sense of a word is either the literal or the sarcastic sense. We call this the Literal/Sarcastic Sense Disambiguation (LSSD) task. We address two issues: 1) collection of a set of target words that can have either literal or sarcastic meanings depending on context; and 2) given an utterance and a target word, automatically detect whether the target word is used in the literal or the sarcastic sense. For the latter, we investigate several word-sense disambiguation methods and show that a Support Vector Machines (SVM) classifier with a modified kernel using word embeddings achieves a 7-10% F1 improvement over a strong lexical baseline.
- An Alignment-Based Model for Compositional Semantics and Sequential Reasoning Jacob Andreas and Dan Klein
This paper describes an alignment-based model for interpreting natural language instructions in context. We approach instruction following as a sequence prediction problem, scoring sequences of actions conditioned on structured observations of text and the environment. Our model explicitly represents both the low-level compositional structure of individual actions and observations, and the high-level search problem that gives rise to full plans. To demonstrate the model's flexibility, we apply it to a diverse set of benchmark tasks. On every task, we outperform strong task-specific baselines, including several new state-of-the-art results.
- Joint Prediction for Entity/Event-Level Sentiment Analysis using Probabilistic Soft Logic Models Lingjia Deng and Janyce Wiebe
In this work, we build an entity/event-level sentiment analysis system, which is able to recognize and infer both explicit and implicit sentiments among entities and events in the text. We design Probabilistic Soft Logic models to integrate explicit sentiments, inference rules, and +/-effect event information (events that positively or negatively affect entities) together. The experiments show that the method is able to greatly improve over baseline accuracies in recognizing entity/event-level sentiments.
- Using Content-level Structures for Summarizing Microblog Repost Trees Jing Li, Wei Gao, Zhongyu Wei, Baolin Peng and Kam-Fai Wong
A microblog repost tree provides strong clues on how an event described therein develops. To help social media users capture the main clues of an event on microblogging sites, we propose a novel repost tree summarization framework by effectively differentiating two kinds of messages on repost trees called leaders and followers, which are derived from content-level structure information, i.e., microblog contents and the reposting relations. To this end, Conditional Random Fields (CRF) model is used to detect leaders across repost tree paths. We then present a variant of random-walk-based summarization model to rank and select salient messages based on the result of leader detection. To reduce the error propagation cascaded from leader detection, we improve the framework by enhancing the random walk with adjustment steps for sampling from leader probabilities given all the reposting messages. For evaluation, we construct two annotated corpora, one for leader detection, and the other for repost tree summarization. Experimental results confirm the effectiveness of our method.
- Traversing Knowledge Graphs in Vector Space Kelvin Guu, John Miller and Percy Liang
Path queries on a knowledge graph can be used to answer compositional questions such as "What languages are spoken by people living in Lisbon?". However, knowledge graphs often have missing facts (edges) which disrupts path queries. Recent models for knowledge base completion impute missing facts by embedding knowledge graphs in vector spaces. We show that these models can be recursively applied to answer path queries, but that they suffer from cascading errors. This motivates a new "compositional" training objective, which dramatically improves all models' ability to answer path queries, in some cases more than doubling accuracy. On a standard knowledge base completion task, we also demonstrate that compositional training acts as a novel form of structural regularization, reliably improving performance across all base models (reducing errors by up to 43%) and achieving new state-of-the-art results.
- Improving Semantic Parsing with Enriched Synchronous Context-Free Grammar Junhui Li, Muhua Zhu, Wei Lu and Guodong Zhou
Semantic parsing maps a sentence in natural language into a structured meaning representation. Previous studies show that semantic parsing with synchronous context-free grammars (SCFGs) achieves favorable performance over most other alternatives. Motivated by the observation that the performance of semantic parsing with SCFGs is closely tied to the translation rules, this paper explores to extend translation rules with high quality and increased coverage in three ways. First, we introduce structure informed non-terminals, better guiding the parsing in favor of well formed structure, instead of using a uniformed non-terminal in SCFGs. Second, we examine the difference between word alignments for semantic parsing and statistical machine translation (SMT) to better adapt word alignment in SMT to semantic parsing. And finally, we address the unknown word translation issue via synthetic translation rules. Evaluation on the standard GeoQuery benchmark dataset shows that our approach outperforms the state-of-the-art across various languages, including English, German and Greek.
- Learning to Recognize Affective Polarity in Similes Ashequl Qadir, Ellen Riloff and Marilyn Walker
A simile is a comparison between two essentially unlike things, such as "Jane swims like a dolphin". Similes often express a positive or negative sentiment toward something, but recognizing the polarity of a simile can depend heavily on world knowledge. For example, "memory like an elephant" is positive, but "memory like a sieve" is negative. Our research explores methods to recognize the polarity of similes on Twitter. We train classifiers using lexical, semantic, and sentiment features, and experiment with both manually and automatically generated training data. Our approach yields good performance at identifying positive and negative similes, and substantially outperforms existing sentiment resources.
- Incorporating Trustiness and Collective Synonym/Contrastive Evidence into Taxonomy Construction Tuan Luu Anh, Jung-jae Kim and See Kiong Ng
Taxonomy plays an important role in many applications by organizing domain knowledge into a hierarchy of is-a relations between terms. Previous works on the taxonomic relation identification from text corpora lack in two aspects: 1) They do not consider the trustiness of individual source texts, which is important to filter out incorrect relations from unreliable sources. 2) They also do not consider collective evidence from synonyms and contrastive terms, where synonyms may provide additional supports to taxonomic relations, while contrastive terms may contradict them. In this paper, we present a method of taxonomic relation identification that incorporates the trustiness of source texts measured with such techniques as PageRank and knowledge-based trust, and the collective evidence of synonyms and contrastive terms identified by linguistic pattern matching and machine learning. The experimental results show that the proposed features can consistently improve performance up to 4%-10% of F-measure.
- Broad-coverage CCG Semantic Parsing with AMR Yoav Artzi, Kenton Lee and Luke Zettlemoyer
We propose a grammar induction technique for AMR semantic parsing. While previous grammar induction techniques were designed to re-learn a new parser for each target application, the recently annotated AMR bank provides a unique opportunity to induce a single model for understanding broad-coverage newswire text and support a wide range of applications. We present a new model that combines CCG parsing to recover compositional aspects of meaning and a factor graph to model non-compositional phenomena, such as anaphoric dependencies. Our approach achieves 66.2 Smatch F1 score on the AMR bank, significantly outperforming the previous state of the art.
- Effective Approaches to Attention-based Neural Machine Translation Thang Luong, Hieu Pham and Christopher D. Manning
The attentional mechanism has been used in neural machine translation (NMT) to selectively focus on parts of the source sentence during translation. However, there has been few work exploring useful architectures for attention-based NMT. This work examines two simple and effective classes of the attentional mechanism: the global approach which always attends to all source words and the local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches over the WMT'14 translation tasks between English and German in both directions. Our attentional NMTs provide a boost of up to 2.8 BLEU over non-attentional systems. Furthermore, by feeding the attentional vector as an additional input to the next time step, we achieve a further gain of up to 1.9 BLEU.
- Representing Text for Joint Embedding of Text and Knowledge Bases Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury and Michael Gamon
Models that learn to represent textual and knowledge base relations in the same continuous latent space are able to perform joint inferences among the two kinds of relations and obtain high accuracy on knowledge base completion (Riedel et al. 2013). In this paper we propose a model that captures the compositional structure of textual relations, and jointly optimizes entity, knowledge base, and text relation representations. The proposed model significantly improves performance over a model that does not share parameters among textual relations with common sub-structure.
- Dual Decomposition Inference for Graphical Models over Strings Nanyun Peng, Ryan Cotterell and Jason Eisner
We investigate dual decomposition for joint MAP inference of many strings. Given an arbitrary graphical model, we decompose it into small acyclic sub-models, whose MAP configurations can be found by finite-state composition and dynamic programming. We force the solutions of these subproblems to agree on overlapping variables, by tuning Lagrange multipliers for an adaptively expanding set of variable-length n-gram count features. This is the first inference method for arbitrary graphical models over strings that does not require approximations such as random sampling, message simplification, or a bound on string length. Provided that the inference method terminates, it gives a certificate of global optimality (though MAP inference in our setting is undecidable in general). On our global phonological inference problems, it does indeed terminate, and achieves more accurate results than max-product and sum-product loopy belief propagation.
- Comparing Word Representations for Implicit Discourse Relation Classification Chloé Braud and Pascal Denis
This paper presents a detailed comparative framework for assessing the usefulness of unsupervised word representations for identifying so-called implicit discourse relations. Specifically, we compare standard one-hot word pair representations against low-dimensional representations based on Brown clusters and word embeddings. We also consider various word vector combination schemes for deriving discourse segment representations from word vectors, and compare representations based either on all words or limited to head words. Our main finding is that denser representations systematically outperform sparser ones and give state-of-the-art performance or above without the need for additional hand-crafted features, thus alleviating the need for traditional external resources.
- Chinese Word Segmentation Leveraging Bilingual Unlabeled Data Wei Chen and Bo Xu
This paper presents a bilingual semi-supervised Chinese word segmentation (CWS) method that leverages the natural segmenting information of English sentences. The proposed method involves learning three levels of features, namely, character-level, phrase-level and sentence-level, provided by multiple sub-models. We use a sub-model of conditional random fields (CRF) to learn monolingual grammars, a sub-model based on character-based alignment to obtain explicit segmenting knowledge, and another sub-model based on transliteration similarity to detect out-of-vocabulary (OOV) words. Moreover, we propose a sub-model leveraging neural network to ensure the proper treatment of the semantic gap and a phrase-based translation sub-model to score the translation probability of the Chinese segmentation and its corresponding English sentences. A cascaded log-linear model is employed to combine these features to segment bilingual unlabeled data, the results of which are used to justify the original supervised CWS model. The evaluation shows that our method results in superior results compared with those of the state-of-the-art monolingual and bilingual semi-supervised models that have been reported in the literature.
- Posterior calibration and exploratory analysis for natural language processing models Khanh Nguyen and Brendan O'Connor
Many models in natural language processing define probabilistic distributions over linguistic structures. We argue that (1) the quality of a model's posterior distribution an and should be directly evaluated, as to whether probabilities correspond to empirical frequencies; and (2) NLP uncertainty can be projected not only to pipeline components, but also to exploratory data analysis, telling a user when to, and not to, trust the NLP analysis. We present methods of analyzing calibration, and compare several commonly used models. We also contribute a coreference sampling algorithm that can create confidence intervals for a political event extraction task.
- Part-of-speech Taggers for Low-resource Languages using CCA Features Young-Bum Kim, Benjamin Snyder and Ruhi Sarikaya
In this paper, we address the challenge of creating accurate and robust part-of-speech taggers for low-resource languages. We propose a method that leverages existing parallel data between the target language and a large set of resource-rich languages without ancillary resources such as tag dictionaries. Crucially, we use CCA to induce latent word representations that incorporate cross-genre distributional cues, as well as projected tags from a full array of resource-rich languages. We develop a probability-based confidence model to identify words with highly likely tag projections and use these words to train a multi-class SVM using the CCA features. Our method yields average performance of 85% accuracy for languages with almost no resources, outperforming a state-of-the-art partially-observed CRF model.
- Semantic Annotation for Microblog Topics Using Wikipedia Temporal Information Tuan Tran, Nam Khanh Tran, Asmelash Teka Hadgu and Robert Jäschke
In this paper we study the problem of semantic annotation for a trending hashtag which is the crucial step towards analyzing user behavior in social media, yet has been largely unexplored. We tackle the problem via linking to entities from Wikipedia. We incorporate the social aspects of trending hashtags by identifying prominent entities for the annotation so as to maximize the information spreading in entity networks. We exploit temporal dynamics of entities in Wikipedia, namely Wikipedia edits and page views to improve the annotation quality. Our experiments show that we significantly outperform the established methods in tweet annotation.
- Extracting Condition-Opinion Relations Toward Fine-grained Opinion Mining Yuki Nakayama and Atsushi Fujii
A fundamental issue in opinion mining is to search a corpus for opinion units, each of which typically comprises the evaluation by an author for a target object from an aspect, such as "This hotel is in a good location". However, few attempts have been made to address cases where the validity of an evaluation is restricted on a condition in the source text, such as "for traveling with small kids". In this paper, we propose a method to extract condition-opinion relations from online reviews, which enables fine-grained analysis for the utility of target objects depending the user attribute, purpose, and situation. Our method uses supervised machine learning to identify sequences of words or phrases that comprise conditions for opinions. We propose several features associated with lexical and syntactic information, and show their effectiveness experimentally.
- Joint Entity Recognition and Disambiguation Gang Luo
Extracting named entities in text and linking extracted names to a given knowledge base are fundamental tasks in applications of text understanding. Existing systems typically run a Named Entity Recognition (NER) model to extract entity names first, then run a Entity Linking model to link extracted names to a knowledge base. NER and Linking models are usually trained separately, and the mutual dependency between the two tasks is ignored. We proposed JERL, Joint Entity Recognition and Linking, to jointly model NER and Linking tasks and capture the mutual dependency between them. It allows the information from each task to improve the performance on the other. To our best knowledge, JERL is the first model to jointly optimize NER and Linking tasks together completely. In experiments on the CoNLL'03/AIDA data set, JERL outperforms state-of-art NER and Linking systems on both tasks, and we found improvements of 0.4% absolute F1 for NER on CoNLL'03, and 0.36% absolute precision@1.0 for Linking on AIDA. Since NER is a widely studied problem, we believe our improvement is significantly.
- Sieve-Based Spatial Relation Extraction with Expanding Parse Trees Jennifer D'Souza and Vincent Ng
Spatial relation extraction is the under-investigated task of identifying the relations on spatial elements. A key challenge introduced by the recent SpaceEval shared task on spatial relation extraction is the identification of MOVELINKs, a type of spatial relation in which up to eight spatial elements can participate. To handle the complexity of extracting MOVELINKs, we combine two ideas that have been successfully applied to information extraction tasks, namely tree kernels and multi-pass sieves, proposing the use of an expanding parse tree as a novel structured feature for training MOVELINK classifiers. Our approach yields state-of-the-art results on two key subtasks in SpaceEval.
- A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution Shaohua Li, Jun Zhu and Chunyan Miao
Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using Singular Value Decomposition (SVD), may incur loss of corpus information. In addition, it is desirable to incorporate global latent factors, such as topics, sentiments or writing styles, into the word embedding model. Since generative models provide a principled way to incorporate latent factors, we propose a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models. The model inference reduces to a low rank weighted positive semidefinite approximation problem. Its optimization is approached by eigendecomposition on a submatrix, followed by online blockwise regression, which is scalable and avoids the information loss in SVD. In experiments on 7 common benchmark datasets, our vectors are competitive to word2vec, and better than other MF-based methods.
- Density-Driven Cross-Lingual Transfer of Dependency Parsers Mohammad Sadegh Rasooli and Michael Collins
We present a novel method for the cross-lingual transfer of dependency parsers. Our goal is to induce a dependency parser in a target language of interest without any direct supervision: instead we assume access to parallel translations between the target and one or more source languages, and to supervised parsers in the source language(s). Our key contributions are to show the utility of dense projected structures when training the target language parser, and to introduce a novel learning algorithm that makes use of dense structures. Results on several languages show an absolute improvement of 5.51% in average dependency accuracy over the state-of-the-art method of (Ma and Xia, 2014). Our average dependency accuracy of 82.18% compares favourably to the accuracy of fully supervised methods.
- Name List Only? Target Entity Disambiguation in Short Texts Yixin Cao, Juanzi Li, Xiaofei Guo, Shuanhu Bai, Heng Ji and Jie Tang
Target entity disambiguation (TED), the task of identifying target entities of the same domain, has been recognized as a critical step in various important applications. In this paper, we propose a graph-based model called TremenRank to collectively identify target entities in short texts given a name list only. TremenRank propagates trust within the graph, allowing for an arbitrary number of target entities and texts using inverted index technology. Furthermore, we design a multi-layer directed graph to assign different trust levels to short texts for better performance. The experimental results demonstrate that our model outperforms state-of-the-art methods with an average gain of 24.8% in accuracy and 15.2% in the F1-measure on three datasets in different domains.
- C3EL: A Joint Model for Cross-Document Co-Reference Resolution and Entity Linking Sourav Dutta and Gerhard Weikum
Cross-document co-reference resolution (CCR) computes equivalence classes over textual mentions denoting the same entity in a document corpus. Named-entity linking (NEL) disambiguates mentions onto entities present in a knowledge base (KB) or maps them to null if not present in the KB. Traditionally, CCR and NEL have been addressed separately. However, such approaches miss out on the mutual synergies if CCR and NEL were performed jointly. This paper proposes C3EL, an unsupervised framework combining CCR and NEL for jointly tackling both problems. C3EL incorporates results from the CCR stage into NEL, and vice versa: additional global context obtained from CCR improves the feature space and performance of NEL, while NEL in turn provides distant KB features for already disambiguated mentions to improve CCR. The CCR and NEL steps are interleaved in an iterative algorithm that focuses on the highest-confidence still unresolved mentions in each iteration. Experimental results on two different corpora, news-centric and web-centric, demonstrate significant gains over state-of-the-art baselines for both CCR and NEL.
- A Single Word is not Enough: Ranking Multiword Expressions Using Distributional Semantics Martin Riedl and Chris Biemann
We present a new unsupervised mechanism, which ranks word n-grams according to their multiwordness. It heavily relies on a new uniqueness measure that computes, based on a distributional thesaurus, how often an n-gram could be replaced in context by a single-worded term. In addition with a punishment mechanism for incomplete terms this forms a new measure called DRUID. Results show large improvements on two small test sets over competitive baselines. We demonstrate the scalability of the method to large corpora, and the independence of the measure of shallow syntactic filtering.
- Syntactic Dependencies and Distributed Word Representations for Analogy Detection and Mining Likun Qiu, Yue Zhang and Yanan Lu
Distributed word representations capture relational similarities by means of vector arithmetics, giving high accuracies on analogy detection. We empirically investigate the use of syntactic dependencies on improving analogy detection based on distributed word representations, showing that a dependency-based embeddings does not perform better than an ngram-based embeddings, but dependency structures can be used to improve analogy detection by filtering candidates. In addition, we show that distributed representations of dependency structure can be used for measuring relational similarities, thereby help analogy mining.
- Leave-one-out Word Alignment without Garbage Collector Effects Xiaolin Wang, Masao Utiyama, Andrew Finch and Eiichiro Sumita
Expectation-maximization algorithms, such as those implemented in GIZA++ pervade the field of unsupervised word alignment. However, these algorithms have a problem of over-fitting, leading to ``garbage collector effects,'' where rare words tend to be erroneously aligned to untranslated words. This paper proposes a leave-one-out expectation-maximization algorithm for unsupervised word alignment to address this problem. The proposed method excludes information derived from the alignment of a sentence pair from the alignment models used to align it. This prevents erroneous alignments within a sentence pair from supporting themselves. Experimental results on Chinese-English and Japanese-English corpora show that the F$_1$, precision and recall of alignment were consistently increased by 5.0% -- 17.2%, and BLEU scores of end-to-end translation were raised by 0.03 -- 1.30.
- Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction Matt Gardner and Tom Mitchell
We explore some of the practicalities of using random walk inference methods, such as the Path Ranking Algorithm (PRA), for knowledge base completion. We show that the random walk probabilities computed (at great expense) by PRA provide no discernible benefit to performance, and so they can safely be dropped. This allows us to define a simpler algorithm for generating feature matrices from graphs, which we call subgraph feature extraction (SFE). In addition to being conceptually simpler than PRA, SFE is much more efficient, reducing computation by an order of magnitude, and more expressive, allowing for much richer features than just paths between two nodes in a graph. We show experimentally that this technique gives substantially better performance than PRA and its variants, improving mean average precision from .432 to .528 on a knowledge base completion task using the NELL knowledge base.
- Verbal and Nonverbal Clues for Real-life Deception Detection Veronica Perez-Rosas, Mohamed Abouelenien, Rada Mihalcea, Yao Xiao, CJ Linton and Mihai Burzo
Deception detection has been receiving an increasing amount of attention from the computational linguistics, speech, and multimodal processing communities. One of the major challenges encountered in this task is the availability of data, and most of the research work to date has been conducted on acted or artificially collected data. The generated deception models are thus lacking real-world evidence. In this paper, we explore the use of multimodal real-life data for the task of deception detection. We develop a new deception dataset consisting of videos from real-life scenarios, and build deception tools relying on verbal and nonverbal features. We achieve classification accuracies in the range of 77-82% when using a model that extracts and fuses features from the linguistic and visual modalities. We show that these results outperform the human capability of identifying deceit.
- A Neural Network Model for Low-Resource Universal Dependency Parsing Long Duong, Trevor Cohn, Steven Bird and Paul Cook
Accurate dependency parsing requires large treebanks, which are only available for a few languages. We propose a method that takes advantage of shared structure across languages to build a mature parser using less training data. We propose a model for learning a shared "universal" parser that operates over an inter-lingual continuous representation of language, along with language-specific mapping components. Compared with supervised learning, our methods give a consistent 8-10 % improvement across several treebanks in low-resource simulations.
- Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks Daojian Zeng, Kang Liu and Jun Zhao
Two problems arise when using distant supervision for relation extraction. First, in this method, an already existing knowledge base is heuristically aligned to texts, and the alignment results are treated as labeled data. However, the heuristic alignment can fail, resulting in wrong label problem. In addition, in previous approaches, statistical models have typically been applied to ad hoc features. The noise that originates from the feature extraction process can cause poor performance. In this paper, we propose a novel model dubbed the Piecewise Convolutional Neural Networks (PCNNs) with multi-instance learning to address these two problems. To solve the first problem, distant supervised relation extraction is treated as a multi-instance problem in which the uncertainty of instance labels is taken into account. To address the latter problem, we avoid feature engineering and instead adopt convolutional architecture with piecewise max pooling to automatically learn relevant features. Experiments show that our method is effective and outperforms several competitive baseline methods.
- Navigating the Semantic Horizon using Relative Neighborhood Graphs Magnus Sahlgren and Amaru Cuba Gyllensten
This paper introduces a novel way to navigate neighborhoods in distributional semantic models. The approach is based on relative neighborhood graphs, which uncover the topological structure of local neighborhoods in semantic space. This has the potential to overcome both the problem with selecting a proper k in k-NN search, and the problem that a ranked list of neighbors may conflate several different senses. We provide both qualitative and quantitative results that support the viability of the proposed method.
- Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception Douwe Kiela and Stephen Clark
Multi-modal semantics has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics, including measuring conceptual similarity and relatedness. We also evaluate cross-modal mappings, through a zero-shot learning task mapping between linguistic and auditory modalities. In addition, we evaluate multi-modal representations on an unsupervised musical instrument clustering task. Finally, we compare auditory with visual multi-modal representations. To our knowledge, this is the first work to combine linguistic and auditory information into multi-modal representations.
- Automatic recognition of habituals: a three-way classification of clausal aspect Annemarie Friedrich and Manfred Pinkal
This paper provides the first fully automatic approach for classifying clauses with respect to their aspectual properties as habitual, episodic or static. We bring together two strands of previous work, which address only the related tasks of the episodic-habitual and stative-dynamic distinctions, respectively. Our method combines different sources of information found to be useful for these tasks. We are the first to exhaustively classify ALL clauses of a text, achieving up to 80% accuracy (baseline 58%) for the three-way classification task, and up to 85% accuracy for related subtasks (baselines 50% and 60%), outperforming previous work. In addition, we provide a new large corpus of Wikipedia texts labeled according to our linguistically motivated guidelines.
- Improving Statistical Machine Translation with a Multilingual Paraphrase Database Ramtin Mehdizadeh Seraj, Maryam Siahbani and Anoop Sarkar
The multilingual Paraphrase Database (PPDB) is a freely available automatically created resource of paraphrases in multiple languages. In statistical machine translation, paraphrases can be used to provide translation for out-of-vocabulary (OOV) phrases. In this paper, we show that a graph propagation approach that uses PPDB paraphrases can be used to improve overall translation quality. We provide an extensive comparison with previous work and show that our PPDB-based method improves the BLEU score by up to 1.79 percent points. We show that our approach improves on the state of the art in three different settings: when faced with limited amount of parallel training data; a domain shift between training and test data; and handling a morphologically complex source language. Our PPDB-based method outperforms the use of distributional profiles from monolingual source data.
- Distributed Representations for Unsupervised Semantic Role Labeling Kristian Woodsend and Mirella Lapata
We present a new approach for unsupervised semantic role labeling that leverages distributed representations. We induce embeddings to represent a predicate, its arguments and their complex interdependence. Argument embeddings are learned from surrounding contexts involving the predicate and neighboring arguments, while predicate embeddings are learned from argument contexts. The induced representations are clustered into roles using a linear programming formulation of hierarchical clustering, where we can model task-specific knowledge. Experiments show improved performance over both previous unsupervised semantic role labeling approaches and other distributed word representation models.
- Transferring Features from a Convolutional Neural Network to Perform Bilingual Lexicon Induction Douwe Kiela, Ivan Ivan Vulić and Stephen Clark
This paper is concerned with the task of bilingual lexicon induction using image-based features. By applying features from a convolutional neural network (CNN), we obtain state-of-the-art performance on a standard dataset, obtaining a 79% relative improvement over previous work which uses bags of visual words based on SIFT features. The CNN image-based approach is also compared with state-of-the-art linguistic approaches to bilingual lexicon induction, even outperforming these for one of three language pairs on another standard dataset. Furthermore, we shed new light on the type of visual similarity metric to use for genuine similarity versus relatedness tasks, and experiment with using multiple layers from the same network in an attempt to improve performance.
- A Graph-based Readability Assessment Method using Word Coupling Zhiwei Jiang, Gang Sun, Qing Gu and Daoxu Chen
This paper proposes a graph-based readability assessment method using word coupling. Compared to the state-of-the-art methods such as the readability formulae, the word-based and feature-based methods, our method develops a coupled bag-of-words model which combines the merits of word frequencies and text features. Unlike the general bag-of-words model which assumes words are independent, our model correlates the words based on their similarities on readability. By applying TF-IDF (Term Frequency and Inverse Document Frequency), the coupled TF-IDF matrix is built, and used in the graph-based classification framework, which involves graph building, merging and label propagation. Experiments are conducted on both English and Chinese datasets. The results demonstrate both effectiveness and potential of the method.
- Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation Wang Ling, Chris Dyer, Alan W Black, Isabel Trancoso, Ramon Fermandez, Silvio Amir, Luis Marujo and Tiago Luis
We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form--function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).
- Do Multi-Sense Embeddings Improve Natural Language Understanding? Jiwei Li and Dan Jurafsky
Learning a distinct representation for each sense of an ambiguous word could lead to more powerful and fine-grained models of vector-space representations. Yet while 'multi-sense' methods have been proposed and tested on artificial word-similarity tasks, we don't know if they improve real natural language understanding tasks. In this paper we introduce a pipelined architecture for incorporating multi-sense embeddings into language understanding, and test the performance of a state-of-the-art multi-sense embedding model (based on Chinese Restaurant Processes). We apply the model to part-of-speech tagging, named entity recognition, sentiment analysis, semantic relation identification and semantic relatedness. We find that if we carefully control for the number of dimensions' sense-specific embeddings, whether alone or concatenated with standard (one vector for all senses) embeddings, introduce slight performance boost in semantic-related tasks, but is of little in others like sentiment analysis. We conclude that the most straightforward way to yield better performance on these tasks is just to increase embedding dimensionality.
- Improving Evaluation of Automatic Summarization Metrics Yvette Graham
Despite automatic summarization metrics having originated in machine translation, methodologies applied to their evaluation have diverged considerably from those still applied to the evaluation of machine translation metrics today. In this paper, we provide an analysis of current evaluation methodologies applied to summarization metrics and identify the following areas of concern: (1) movement away from evaluation by correlation with human assessment; (2) omission of important components of human assessment from evaluations, in addition to large numbers of metric variants; (3) absence of methods of significance testing improvements over a baseline. We outline an evaluation methodology that overcomes all such challenges, providing the first method of significance testing appropriate for evaluation of summarization metrics. Our evaluation reveals for the first time which metric variants significantly outperform others, optimal metric variants distinct from current recommended best variants, as well as machine translation metric BLEU to have performance on-par with ROUGE for the purpose of evaluation of summarization systems. We subsequently replicate a recent large-scale evaluation that relied on, what we now know to be, suboptimal ROUGE variants revealing distinct conclusions about the relative performance of state-of-the-art summarization systems.
- A Tableau Prover for Natural Logic and Language lasha abzianidze
Modeling the entailment relation over sentences is one of the generic problems of natural language understanding. In order to account for this problem, we design a theorem prover for a version of Natural Logic -- a logic whose terms resemble natural language expressions. The prover is based on an analytic tableau method and employs syntactically and semantically motivated schematic rules. The formulas of the logic are simply-typed lambda terms; they might be obtained by modifying the CCG derivations of wide-coverage CCG parsers. Pairing the prover with a preprocessor, generating the formulas for wide-coverage linguistic expressions, results in a proof system for natural language. It is shown that the system obtains a comparable accuracy (80%) on the unseen SICK data while achieving the state-of-the-art precision (98%).
- Humor Recognition and Humor Anchor Extraction Diyi Yang, Alon Lavie, Chris Dyer and Eduard Hovy
Humor is an essential component in personal communication. How to create computational models to discover the structure behind humor, recognize humor and even extract humor anchors remains a challenge. In this work, we first identify several semantic structures behind humor and design sets of features for each theory, and next employ a computational approach to recognize humor. Furthermore, we develop a simple and effective method to extract anchors that enable humor in a sentence. Experiments conducted on two datasets demonstrate that our humor recognizer is effective in automatically distinguishing between humorous and non-humorous texts and our extracted humor anchors correlate quite well with human annotations.
- Discriminative Neural Sentence Modeling by Tree-Based Convolution Lili Mou, Hao Peng, Ge Li, Yan Xu, Lu Zhang and Zhi Jin
This paper proposes a tree-based convolutional neural network (TBCNN) for discriminative sentence modeling. Our model leverages either constituency trees or dependency trees of sentences. The tree-based convolution process extracts sentences' structural features, which are then aggregated by max pooling. Such architecture allows short propagation paths between the output layer and underlying feature detectors, enabling effective structural feature learning and extraction. We evaluate our models on two tasks: sentiment analysis and question classification. In both experiments, TBCNN outperforms previous state-of-the-art results, including existing neural networks and dedicated feature/rule engineering. We also make efforts to visualize the tree-based convolution process, shedding light on how our models work.
- Solving Geometry Problems: Combining Text and Diagram Interpretation Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi, Oren Etzioni and Clint Malcolm
This paper introduces GEOS, the first end-to-end system that solves SAT geometry questions. We cast the problem of question interpretation as a sub-modular optimization over learned text and diagram understanding modules. We show that combining text and diagram understanding strongly outperforms either alone, realizing 68% accuracy on SAT geometry questions. Furthermore, we demonstrate that our use of diagram interpretation in GEOS boosts the accuracy of dependency parsing.
- Consistency-Aware Search for Word Alignment Shiqi Shen, Yang Liu and Maosong Sun
As conventional word alignment search algorithms usually ignore the consistency constraint in translation rule extraction, improving alignment accuracy does not necessarily increase translation quality. We propose to use coverage, which reflects how well extracted phrases can recover the training data, to enable word alignment to model consistency and correlate better with machine translation. This can be done by introducing an objective that maximizes both alignment model score and coverage. We introduce an efficient algorithm to calculate coverage on the fly during search. Experiments show that our consistency-aware search algorithm significantly outperforms both generative and discriminative alignment approaches across various languages and translation models.
- Compact, Efficient and Unlimited Capacity: Language Modeling with Compressed Suffix Trees Ehsan Shareghi, Matthias Petri, Gholamreza Haffari and Trevor Cohn
Efficient methods for storing and querying language models are critical for scaling to large corpora and high Markov orders. In this paper we propose methods for modeling extremely large corpora without imposing a Markov condition. At its core, our approach uses a succinct index Ð a compressed suffix tree Ð which provides near optimal compression while supporting efficient search. We present algorithms for on-the-fly computation of probabilities under a Kneser-Ney language model. Our technique is exact and although slower than leading LM toolkits, it shows promising scaling properties, which we demonstrate through infinite-order modeling over the full Wikipedia collection.
- Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation jinsong su, Deyi Xiong, Biao Zhang, Yang Liu and Junfeng Yao
Learning semantic representations and tree structures of bilingual phrases is beneficial for statistical machine translation. In this paper, we propose a new neural network model called Bilingual Correspondence Recursive Autoencoder (BCorrRAE) to model bilingual phrases in translation. We incorporate word alignment information into BCorrRAE to allow it freely access bilingual constraints at different levels. It minimizes a joint error function on recursive autoencoder reconstruction, structural alignment consistency and cross-lingual reconstruction so as to not only generate alignmentconsistent phrase structures, but also capture different levels of semantic relations within bilingual phrases. In order to examine the effectiveness of BCorrRAE, we incorporate both semantic and structural similarity features built on bilingual phrase representations and tree structures learned by BCorrRAE into a state-of-theart SMT system. Experiments on NIST Chinese-English test sets show that our model achieves a substantial improvement of up to 1.81 BLEU points over the baseline.
- How to Avoid Unwanted Pregnancies: Domain Adaptation using Neural Network Models Shafiq Joty, Hassan Sajjad, Nadir Durrani, Kamla Al-Mannai, Ahmed Abdelali and Stephan Vogel
We present novel models for domain adaptation based on the neural network joint model (NNJM). Our models maximize the cross entropy by regularizing the loss function with respect to in-domain model. Domain adaptation is carried out by assigning higher weight to out-domain sequences that are similar to the in-domain data. In our alternative model we take a more restrictive approach by additionally penalizing sequences similar to the out-domain data. Our models achieve better perplexities than the baseline NNJM models and give improvements of up to 0.5 and 0.6 BLEU points in Arabic-to-English and English-to-German language pairs, on a standard task of translating TED talks.
- Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning Jianpeng Cheng and Dimitri Kartsaklis
Deep compositional models of meaning acting on distributional representations of words in order to produce vectors of larger text constituents are evolving to a popular area of NLP research. We detail a compositional distributional framework based on a rich form of word embeddings that aims at facilitating the interactions between words in the context of a sentence. Embeddings and composition layers are jointly learned against a generic objective that enhances the vectors with syntactic information from the surrounding context. Furthermore, each word is associated with a number of senses, the most plausible of which is selected dynamically during the composition process. We evaluate the produced vectors qualitatively and quantitatively with positive results. At the sentence level, the effectiveness of the framework is demonstrated on the MSRPar task, for which we report results within the state-of-the-art range.
- More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets Axel Schulz, Christian Guckelsberger and Benedikt Schmidt
Social media is a rich source of up-to-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity to process this information further. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across different cities, the training of efficient models requires labeling data from each city of interest, which is costly and time consuming. In this paper, we investigate which features are most suitable for training generalizable models, i.e., models that show good performance across different datasets. We re-implemented the most popular features from the state of the art in addition to other novel approaches, and evaluated them on data from ten different cities. We show that many sophisticated features are not necessarily valuable for training a generalized model and are outperformed by classic features such as plain word-n-grams and character-n-grams.
- Flexible Domain Adaptation for Automated Essay Scoring Using Correlated Linear Regression Peter Phandi, Kian Ming A. Chai and Hwee Tou Ng
Most of the current automated essay scoring (AES) systems are trained using manually graded essays from a specific prompt. These systems experience a drop in accuracy when used to grade an essay from a different prompt. Obtaining a large number of manually graded essays each time a new prompt is introduced is costly and not viable. We propose domain adaptation as a solution to adapt an AES system from an initial prompt to a new prompt. We also propose a novel domain adaptation technique which uses Bayesian linear ridge regression. We evaluate our domain adaptation technique on the publicly available Automated Student Assessment Prize (ASAP) dataset and show that our proposed technique is a competitive default domain adaptation algorithm for the AES task.
- Social Media Text Classification under Negative Covariate Shift Geli Fei and Bing Liu
In a typical social media content analysis task, the user is interested in analyzing posts of a particular topic. Identifying such posts is often formulated as a classification problem. However, this problem is challenging. One key issue is covariate shift. That is, the training data is not fully representative of the test data. We observed that the covariate shift mainly occurs in the negative data because topics discussed in social media are highly diverse and numerous, but the user-labeled negative training data may cover only a small number of topics. This paper proposes a novel technique to solve the problem. The key novelty of the technique is the transformation of document representation from the traditional n-gram feature space to a center-based similarity (CBS) space. In the CBS space, the covariate shift problem is significantly mitigated, which enables us to build much better classifiers. Experiment results show that the proposed approach markedly improves classification.
- Show Me Your Evidence - an Automatic Method for Context Dependent Evidence Detection Ruty Rinott, Lena Dankin, Carlos Alzate Perez, Mitesh M. Khapra, Ehud Aharoni and Noam Slonim
Engaging in a debate with oneself or others to take decisions is an integral part of our day-to-day life. A debate on a topic (say,use of performance enhancing drugs) typically proceeds by one party making an assertion/claim (say, PEDs are bad for health) and then providing an evidence to support the claim (say, a 2006 study shows that PEDs have psychiatric side effects). In this work, we propose the task of automatically detecting such evidences from unstructured text that support a given claim. This task has many practical applications in decision support and persuasion enhancement in a wide range of domains. We first introduce an extensive benchmark data set tailored for this task, which allows training statistical models and assessing their performance. Finally, we suggest a system architecture, based on supervised learning, to address this task, and report its promising results
- Spelling Correction of User Search Queries through Statistical Machine Translation Saša Hasan, Carmen Heger and Saab Mansour
We use character-based statistical machine translation in order to correct user search queries in the e-commerce domain. The training data is automatically extracted from event logs where users re-issue their search queries with potentially corrected spelling within the same session. We show results on a test set which was annotated by humans and compare against online autocorrection capabilities of three additional web sites. Overall, the methods presented in this paper outperform fully productized spellchecking and autocorrection services in terms of accuracy and F1 score. We also propose novel evaluation steps based on retrieved search results of the corrected queries in terms of quantity and relevance.
- Fast, Flexible Models for Discovering Topic Correlation across Weakly-Related Collections Jingwei Zhang, Aaron Gerow, Jaan Altosaar, James Evans and Richard Jean So
Weak topic correlation across document collections with different numbers of topics in individual collections present challenges for existing cross-collection topic models. This paper introduces two probabilistic topic models, Correlated LDA (C-LDA) and Correlated HDP (C-HDP). These address problems that can arise when analyzing large, asymmetric, and potentially weakly-related collections. Topic correlations in weakly-related collections typically lie in the tail of the topic distribution, where they would be overlooked by models unable to fit large numbers of topics. To efficiently model this long tail for large-scale analysis, our models implement a parallel sampling algorithm based on the Metropolis-Hastings and alias methods (Yuan et al., 2014). The models are first evaluated on synthetic data, generated to simulate various collection-level asymmetries. We then present a case study of modeling over 300k documents in collections of sciences and humanities research from JSTOR.
- Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs Miguel Ballesteros, Chris Dyer and Noah A. Smith
We present extensions to a continuous-state dependency parsing method that makes it applicable to morphologically rich languages. Starting with a high-performance transition-based parser that uses long short-term memory (LSTM) recurrent neural networks to learn representations of the parser state, we replace look-up based word representations with representations constructed based on the orthographic representations of the words, also using LSTMs. This allows statistical sharing across word forms that are similar on the surface. Results in morphologically rich languages show that the parsing model benefits from incorporating the character-based encodings of words.
- Personality Profiling of Fictional Characters using Sense-Level Links between Lexical Resources Lucie Flekova and Iryna Gurevych
Automated personality profiling of fictional characters, based on rigorous models from personality psychology, has a potential to impact numerous domains - readers and movie fans could receive better recommendations based on the personality types they like (or that is similar to theirs), writers could improve the complexity and coherence of their heroes, literary scientists could conduct new large-scale studies, personality psychologists could easily look up exemplary cases. This study focuses on personality prediction of heroes in novels based on the Five-Factor Model of personality. We present a novel collaboratively built dataset of hero personality and design our task as a text classification problem. We incorporate features of both lexical and vectorial semantics, including WordNet and VerbNet sense-level information and vectorial word representations. We evaluate three machine learning models based on the speech, actions and predicatives of the heroes, and show that especially the lexical-semantic features significantly outperform the baselines. Qualitative analysis reveals that the most predictive features correspond to the reported findings in personality psychology and NLP experiments on human personality.
- Human Evaluation of Grammatical Error Correction Systems Roman Grundkiewicz, Marcin Junczys-Dowmunt and Edward Gillian
The paper presents the results of the first large-scale human evaluation of automatic grammatical error correction (GEC) systems. Twelve participating systems and the unchanged input of the CoNLL-2014 shared task have been reassessed in a WMT-inspired human evaluation procedure. Methods introduced for the Workshop of Machine Translation evaluation campaigns have been adapted to GEC and extended where necessary. The produced rankings are used to evaluate standard metrics for grammatical error correction in terms of correlation with human judgment.
- Mise en Place: Unsupervised Interpretation of Instructional Recipes Chloé Kiddon, Ganesa Thandavam Ponnuraj, Luke Zettlemoyer and Yejin Choi
We consider the problem of automatically mapping instructional recipes to action graphs, which define what actions should be performed on which objects and in what order. Recovering such structures can be very challenging, due to specialized language use where, for example, verbal arguments are commonly elided when they can be inferred from context. We present an unsupervised hard EM approach for learning probabilistic models that segment instructions into a set of actions and identify the most likely connections among those actions. Our model incorporates different aspects of instructional semantics, such as likely locations and selectional preferences for different actions. Experiments on a corpus of cooking recipes demonstrate the ability to recover high quality action graphs, outperforming a strong sequential baseline by up to 7.5 points in F1, while also automatically discovering general-purpose knowledge about cooking.
- Forest Convolutional Network Phong Le and Willem Zuidema
According to the principle of compositionality, the meaning of a sentence should be computed from the meaning of its parts and the way they are syntactically combined. In practice, however, the syntactic structure is computed by automatic parsers which are far-from-perfect and not tuned to the specifics of the task. Current recursive neural network (RNN) approaches to computing sentence meaning therefore run into a number of practical difficulties, including the need to carefully select a parser appropriate for the task, deciding how and to what extent syntactic context modifies the semantic composition function, as well as on how to transform parse trees to conform to the branching settings (typically, binary branching) of the RNN. This paper introduces a new model, the Forest Convolutional Network, that avoids all of these challenges, by taking a parse forest, rather than a single tree, as input and by allowing arbitrary branching factors. We report improvements over the state-of-the-art in sentiment analysis and question classification.
- Generalized Agreement for Bidirectional Word Alignment Chunyang Liu, Yang Liu and Maosong Sun
While agreement-based joint training has proven to deliver state-of-the-art alignment accuracy, the produced word alignments are usually restricted to one-to-one mappings because of the hard constraint on agreement. We propose a general framework to allow for arbitrary loss functions that measure the disagreement between asymmetric alignments. The loss functions can not only be defined between asymmetric alignments but also between alignments and other latent structures such as phrase segmentation. We use a Viterbi EM algorithm to train the joint model since the inference is intractable. Experiments on Chinese-English translation show that joint training with generalized agreement achieves significant improvements over two state-of-the-art alignment methods.
- Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng and Zhi Jin
Relation classification is an important research arena in the field of natural language processing (NLP). In this paper, we present SDP-LSTM, a novel neural network to classify the relation of two entities in a sentence. The neural architecture leverages the shortest dependency path (SDP) between two entities; multichannel recurrent neural networks, with long short term memory (LSTM) units, pick up heterogeneous information along the SDP. Our proposed model has several distinct features: (1) The shortest dependency paths retain most relevant information (to relation classification), while eliminating irrelevant words in the sentence. (2) The multichannel LSTM networks allow effective information integration from heterogeneous sources over the dependency paths. (3) A customized dropout strategy regularizes the neural network to alleviate overfitting. We test our model on the SemEval 2010 relation classification task, and achieve an F1-score of 83.7%, higher than competing methods in the literature.
- Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents Pengfei Liu, Xipeng Qiu and Xuanjing Huang
Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, it is still a challenge task to model long texts, such as sentences and documents. In this paper, we propose a multi-timescale long short-term memory (MT-LSTM) neural network to model long texts. MT-LSTM partitions the hidden states of the standard LSTM into several groups. Each group is activated at different time period. Thus, MT-LSTM can model very long documents as well as short sentences. Experiments on four benchmark datasets show that our model outperforms the other neural models in text classification task.
- FINET: Context-Aware Fine-Grained Named Entity Typing Luciano Del Corro, Abdalghani Abujabal, Rainer Gemulla and Gerhard Weikum
We propose FINET, a system for detecting the types of named entities that occur in short inputs such as sentences or tweets with respect to WordNet's super fine-grained type system. FINET generates candidate types using a sequence of multiple extractors, ranging from explicitly mentioned types to implicit types, and subsequently selects the most appropriate type using ideas from word-sense disambiguation. FINET combats the data scarcity and noise problems that plague existing systems for named entity typing: FINET does not rely on supervision in many of its extractors and it generates training data for type selection directly from WordNet and other resources. FINET supports the most fine-grained type system so far, including types for which no training data is provided. Our experiments indicate that FINET outperforms state-of-the-art methods in terms of recall, precision, and granularity of extracted types.
- Indicative Tweet Generation: An Extractive Summarization Problem? Priya Sidhaye and Jackie Chi Kit Cheung
Social media such as Twitter have become an important method of communication, with potential opportunities for NLG to facilitate the generation of social media content. We focus on the generation of indicative tweets that contain a link to an external web page. While it is natural and tempting to view the linked web page as the source text from which the tweet is generated in an extractive summarization setting, it is unclear to what extent actual indicative tweets behave like extractive summaries. We collect a corpus of indicative tweets with their associated articles and investigate whether they can actually be derived from the articles using extractive methods. We also consider the impact of the formality and genre of the article. Our results demonstrate the limits of viewing indicative tweet generation as extractive summarization, and point to the need for the development of a methodology for tweet generation that is sensitive to genre-specific issues.
- Learning a Deep Hybrid Model for Semi-Supervised Text Categorization Alexander Ororbia II, C. Lee Giles and David Reitter
We present a novel deep hybrid architecture to perform text classification with an incremental, semi-supervised model. A fine-tuning algorithm is described for incorporating a top-down mechanism for jointly tuning model parameters during each increment of the online learning process. The model is shown to outperform support vector machines trained on the supervised-only portion of the data, across a wide range of splits between supervised and unsupervised training data.
- Joint Embedding of Query and Ad by Leveraging Implicit Feedback Sungjin Lee and Yifan Hu
Sponsored search is at the center of a multibillion dollar market established by search technology. Accurate ad click prediction is a key component for this market to function since the pricing mechanism heavily relies on the estimation of click probabilities. Lexical features derived from the text of both the query and ads play a significant role, complementing features based on historical click information. The purpose of this paper is to explore the use of word embedding techniques to generate effective text features that can capture not only lexical similarity between query and ads but also the latent user intents. We identify several potential weaknesses of the plain application of conventional word embedding methodologies for ad click prediction. These observations motivated us to propose a set of novel joint word embedding methods by leveraging implicit click feedback. We verify the effectiveness of these new word embedding models by adding features derived from the new models to the click prediction system of a commercial search engine. Our evaluation results clearly demonstrate the effectiveness of the proposed methods. To the best of our knowledge this work is the first successful application of word embedding techniques for the sponsored search task.
- Scientific Article Summarization Using Citation-Context and Article's Discourse Structure Arman Cohan and Nazli Goharian
We propose a summarization approach for scientific articles which takes advantage of citation-context and the document discourse model. While citations have been previously used in generating scientific summaries, they lack the related context from the referenced article and therefore do not accurately reflect the article's content. Our method overcomes the problem of inconsistency between the citation summary and the article's content by providing context for each citation. We also leverage the inherent scientific article's discourse structure for producing better summaries. We show that our proposed method effectively improves over existing summarization approaches (greater than 30% improvement over the best performing baseline) in terms of ROUGE scores on TAC2014 scientific summarization datatset. While the dataset we use for evaluation is in the biomedical domain, most of our approaches are general and therefore adaptable to other domains.
- Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks Hua He, Kevin Gimpel and Jimmy Lin
Modeling sentence similarity is complicated by the ambiguity and variability of linguistic expression. To cope with these challenges, we propose a model for comparing sentences that uses a multiplicity of perspectives. We first model each sentence using a convolutional neural network that extracts features at multiple levels of granularity and uses multiple types of pooling. We then compare our sentence representations at several granularities using multiple similarity metrics. We apply our model to three tasks, including the Microsoft Research paraphrase identification task and two SemEval semantic textual similarity tasks. We obtain strong performance on all tasks, rivaling or exceeding the state-of-the-art without using external resources like WordNet or parsers.
- Syntax-based Rewriting for Simultaneous Machine Translation He He, Alvin Grissom II, John Morgan, Jordan Boyd-Graber and Hal Daumé III
Divergent word order between languages causes delay in simultaneous machine translation. We present a sentence rewriting method that generates interpretation-like translations for better speed-accuracy tradeoff. We design grammaticality and meaning-preserving syntactic transformation rules operating on a constituent parse tree. We apply the rules to the reference translation, such that its word order is closer to the source language word order. On Japanese-English translation (two languages with substantially different structure), we show that incorporating the rewritten, more monotonic reference translation into the learning of a phrase-based machine translation system enables it to produce better translations faster than the baseline system that uses gold reference translation only.
- Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language Luheng He, Mike Lewis and Luke Zettlemoyer
This paper introduces the task of question-answer driven semantic role labeling (QA-SRL), where question-answer pairs are used to represent predicate-argument structure. For example, the verb ``introduce'' in the previous sentence would be labeled with the questions "What is introduced?", and "What introduces something?", each paired with the phrase from the sentence that gives the correct answer. Posing the problem this way allows the questions themselves to define the set of possible roles, without the need for predefined frame or thematic role ontologies. It also allows for scalable data collection by annotators with very little training and no linguistic expertise. We gather data in two domains, newswire text and Wikipedia articles, and introduce simple classifier-based models for predicting which questions to ask and what their answers should be. Our results show that non-expert annotators can produce high quality QA-SRL data, and also establish baseline performance levels for future work on this task.
- Bilingual Structured Language Models for Statistical Machine Translation Ekaterina Garmash and Christof Monz
In this paper we describe a novel syntactic target-side language model for phrase-based statistical machine translation. Our approach represents a new way to adapt structured language models (Chelba and Jelinek, 2000) to statistical machine translation, and a first attempt to adapt them to phrase-based statistical machine translation. The integration of the bilingual structured model requires minimal changes in the phrase-based SMT pipeline. We propose a number of variations of the model and evaluate them in a series of rescoring experiments. Rescoring of 1000-best translation lists produces statistically significant improvements of up to 0.7 BLEU over a strong baseline for Chinese-English.
- Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings Pengfei Liu, Shafiq Joty and Helen Meng
The tasks in fine-grained opinion mining can be regarded as either a token-level sequence labeling problem or as a semantic compositional task. We propose a general class of discriminative models based on recurrent neural networks (RNNs) and word embeddings, that can be successfully applied to such tasks without any task-specific feature engineering effort. Our experimental results on the task of opinion target identification show that RNNs, without using any hand-crafted features, outperform feature-rich CRF-based models. The RNNs based on the long short-term memory (LSTM) architecture deliver the best results outperforming previous methods including top performing systems in the SemEval'14 evaluation campaign.
- Learning to Automatically Solve Logic Grid Puzzles Arindam Mitra and Chitta Baral
Logic grid puzzle is a genre of logic puzzles in which we are given (in a natural language) a scenario, the object to be deduced and certain clues. The reader has to figure out the solution using the clues provided and some generic domain constraints. In this paper, we present a system, Logicia, that takes a logic grid puzzle and the set of elements in the puzzle and tries to solve it by translating it to the knowledge representation and reasoning language of Answer Set Programming (ASP) and then using an ASP solver. The translation to ASP involves extraction of entities and their relations from the clues. For that we use a novel learning based approach which uses varied supervision, including the entities present in a clue and the expected representation of a clue in ASP. Our system, Logicia, learns to automatically translate a clue with 81.11% accuracy and is able to solve 71% of the problems of a corpus. This is the first learning system that can solve logic grid puzzles described in natural language in a fully automated manner.
- Automatically Solving Number Word Problems by Semantic Parsing and Reasoning Shuming Shi, Yuehui Wang, Chin-Yew Lin, Xiaojiang Liu and Yong Rui
This paper presents a semantic parsing approach to automatically solving math word problems. A new meaning representation language is designed to bridge natural language text and math formulas. A CFG parser is implemented based on 9,600 semi-automatically created grammar rules. We conduct experiments on a test set of over 1000 number word problems and yield 96% precision and 62.5% recall.
- Co-Training For Topic Classification in Scholarly Data Florin Bulgarov, Cornelia Caragea and Rada Mihalcea
With the exponential growth of scholarly data during the past few years, effective methods for topic classification are greatly needed. Current approaches usually require large amounts of expensive labeled data in order to make accurate predictions. In this paper, we posit that, in addition to a research article's textual content, its citation network also contains valuable information. We describe a co-training approach that uses the text and citation information of a research article as two different views to predict the topic of the article. We show that this method improves significantly over the individual classifiers, while also bringing a substantial reduction in the amount of labeled data required for training accurate classifiers.
- Search-Aware Tuning for Hierarchical Phrase-based Decoding Feifei Zhai, Liang Huang and Kai Zhao
Parameter tuning is a key problem in statistical machine translation. Most conventional parameter tuning algorithms are agnostic of the decoding algorithms, so that parameters tuned are not specially optimized to handle search errors in decoding. The recent research of ``search-aware tuning'' (Liu & Huang, 2014) address this problem so that promising partial translations are more likely to survive the inexact search beam. We extend their approach from phrase-based translation to syntax-based translation by generalizing the translation quality metrics for partial translations to handle tree-structured derivations in a way inspired by inside-outside algorithm. Our approach is simple to use and can be applied to all conventional parameter tuning methods as a plugin. Extensive experiments on Chinese-to-English translation show significant Bleu improvements in MERT and MIRA.
- Solving General Arithmetic Word Problems Subhro Roy and Dan Roth
This paper presents a novel approach to automatically solving arithmetic word problems. This is the first algorithmic approach that can handle arithmetic problems with multiple steps and operations, without depending on additional annotations or predefined templates. We develop a theory for expression trees that can be used to represent and evaluate the target arithmetic expressions; we use it to uniquely decompose the target arithmetic problem to multiple classification problems; we then compose an equation tree, combining these with world knowledge through a constrained inference framework. Our classifiers gain from the use of quantity schemas that supports better extraction of features. Experimental results show that our method outperforms existing systems, achieving state of the art performance on benchmark datasets of arithmetic word problems.
- Reading Documents for Bayesian Online Change Point Detection Jaesik Choi
Modeling non-stationary time-series data for making predictions is challenging but important tasks. One of the key issues is to identify long-term changes accurately in time-varying data. Bayesian Online Change Point Detection (BO-CPD) algorithms efficiently detect long-term changes without assuming the Markov property which is vulnerable to local signal noises. We propose Document based BO-CPD (DBO-CPD) model which automatically detect long-term temporal changes of continuous variables based on a novel dynamic Bayesian analysis which combines a non-parametric regression, the Gaussian Process (GP), with generative models on texts such as news articles and posts on social networks. Since texts often include important clues of signal changes, DBO-CPD enables to predict long-term changes accurately. We show that our algorithm outperforms existing BO-CPDs in two real-world datasets: stock prices and movie revenues.
Short papers
- Closing the Gap: Domain Adaptation from Explicit to Implicit Discourse Relations Yangfeng Ji, Gongbo Zhang and Jacob Eisenstein
Many discourse relations are explicitly marked with discourse connectives, and these examples could potentially serve as a plentiful source of training data for recognizing implicit discourse relations. However, there are important linguistic differences between explicit and implicit discourse relations, which limit the accuracy of such an approach. We account for these differences by applying techniques from domain adaptation, treating implicitly and explicitly-marked discourse relations as separate domains. The distribution of surface features varies across these two domains, so we apply a marginalized denoising autoencoder to induce a dense, domain-general representation. The label distribution is also domain-specific, so we apply a resampling technique that is similar to instance weighting. In combination with a set of automatically-labeled data, these improvements eliminate more than 80% of the transfer loss incurred by training an implicit discourse relation classifier on explicitly-marked discourse relations.
- A Joint Dependency Model of Morphological and Syntactic Structure for Statistical Machine Translation Rico Sennrich and Barry Haddow
When translating between two languages that differ in their degree of morphological synthesis, syntactic structures in one language may be realized as morphological structures in the other, and SMT models need a mechanism to learn from such translations. Prior work has used morpheme splitting with flat representations that do not encode the hierarchical structure between morphemes, but this structure is relevant for learning morphosyntactic constraints and selectional preferences. We propose to model syntactic and morphological structure jointly in a dependency translation model, allowing the system to generalize to the level of morphemes. We present a dependency representation of German compounds and particle verbs that results in improvements in translation quality of 1.4Ð1.8 B LEU in the WMT EnglishÐGerman translation task.
- EMNLP versus ACL: Analyzing NLP research over time Sujatha Das Gollapalli and Xiaoli Li
The conferences ACL (Association for Computational Linguistics) and EMNLP (Empirical Methods in Natural Language Processing) rank among the premier venues that track the research developments in Natural Language Processing and Computational Linguistics. In this paper, we present a study on the research papers of approximately two decades from these two NLP conferences. We apply keyphrase extraction and corpus analysis tools to the proceedings from these venues and propose probabilistic and vector-based representations to represent the topics published in a venue for a given year. Next, similarity metrics are studied over pairs of venue representations to capture the progress of the two venues with respect to each other and over time.
- JEAM: A Novel Model for Cross-Domain Sentiment Classification Based on Emotion Analysis Kun-Hu Luo, Zhi-Hong Deng, Hongliang Yu and Liang-Chen Wei
Cross-domain sentiment classification (CSC) aims at learning a sentiment classifier for unlabeled data in the target domain based on the labeled data from a different source domain. Due to the differences of data distribution of two domains in terms of the raw features, the CSC problem is difficult and challenging. Previous researches mainly focused on concepts mining by clustering words across data domains, which ignored the importance of authors' emotion contained in data, or the different representations of the emotion between domains. In this paper, we propose a novel framework to solve the CSC problem, by modelling the emotion across domains. We first develop a probabilistic model named JEAM to model author's emotion state when writing. Then, an EM algorithm is introduced to solve the likelihood maximum problem and to obtain the latent emotion distribution of the author. Finally, a supervised learning method is utilized to assign the sentiment polarity to a given review. Extensive experiments show that our approach is effective and outperforms state-of-the-art approaches.
- TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering Linmei Hu, Juanzi Li, Xiaoli Li, Chao Shao and Xuzhong Wang
Dirichlet process mixture model (DPMM) has great potential for detecting the underlying structure of data. Extensive studies have applied it for text clustering in terms of topics. However, due to the unsupervised nature, the topic clusters are always less satisfactory. Considering that people often have some prior knowledge about which potential topics should exist in given data, we aim to incorporate such knowledge into the DPMM to improve text clustering. We propose a novel model TSDPMM based on a new seeded P\'olya urn scheme. Experimental results on document clustering across three datasets demonstrate our proposed TSDPMM significantly outperforms state-of-the-art DPMM model and can be applied in a lifelong learning framework.
- Multi-label Text Categorization with Joint Learning Predictions-as-Features Method li li
Multi-label text categorization is a type of text categorization, where each document is assigned to one or more categories. Recently, a series of methods have been developed, which train a classifier for each label, organize the classifiers in a partially ordered structure and take predictions produced by the former classifiers as the latter classifiers' features. These predictions-as-features style methods model high order label dependencies and obtain high performance. Nevertheless, the predictions-as-features methods suffer a drawback. When training a classifier for one label, the predictions-as-features methods can model dependencies between former labels and the current label, but they can't model dependencies between the current label and the latter labels. To address this problem, we propose a novel joint learning algorithm that allows the feedbacks to be propagated from the classifiers for latter labels to the classifier for the current label. We conduct experiments using real-world textual data sets, and these experiments illustrate the predictions-as-features models trained by our algorithm outperform the original models.
- A Framework for Comparing Groups of Documents Arun Maiya
We present a general framework for comparing multiple groups of documents. A bipartite graph model is proposed where document groups are represented as one node set and the comparison criteria are represented as the other node set. Using this model, we present basic algorithms to extract insights into similarities and differences among the document groups. Finally, we demonstrate the versatility of our framework through an analysis of NSF funding programs for basic research.
- Convolutional Sentence Kernel from Word Embeddings for Text Categorization Jonghoon Kim, Francois Rousseau and Michalis Vazirgiannis
This paper introduces a convolutional sentence kernel based on word embeddings. Our kernel overcomes the sparsity issue that arises when classifying short documents or in case of little training data. Experiments on six sentence datasets showed statistically significant higher accuracy over the standard linear kernel with n-gram features and other proposed models.
- Specializing Word Embeddings for Similarity or Relatedness Douwe Kiela, Felix Hill and Stephen Clark
We demonstrate the advantage of specializing semantic word embeddings for either similarity or relatedness. We compare two variants of retrofitting and a joint-learning approach, and find that all three yield specialized semantic spaces that capture human intuitions regarding similarity and relatedness better than unspecialized spaces. We also show that using specialized spaces in NLP tasks and applications leads to clear improvements, for document classification and synonym selection, which rely on either similarity or relatedness but not both.
- An Improved Tag Dictionary for Faster Part-of-Speech Tagging Robert Moore
At least since Ratnaparkhi (1996), tag dictionaries have been used to speed up part-of-speech tagging by limiting the set of possible tags for each word. While Ratnaparkhi's tag dictionary makes tagging faster but less accurate, Moore's (2014) alternative tag dictionary makes tagging as fast as Ratnaparkhi's, but with no decrease in accuracy. In this paper, we show that a very simple semi-supervised variant of Ratnaparkhi's method results in a much tighter tag dictionary than either Ratnaparkhi's or Moore's, with accuracy as high as Moore but much faster tagging---more than 100,000 tokens per second in Perl.
- PhraseRNN: Phrase Recursive Neural Network for Aspect-based Sentiment Analysis Thien Hai Nguyen and Kiyoaki Shirai
This paper presents a new method to identify sentiment of an aspect of an entity. It is an extension of RNN (Recursive Neural Network) that takes both dependency and constituent trees of a sentence into account. Results of an experiment show that our method significantly outperforms previous methods.
- Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia Seokhwan Kim, Rafael E. Banchs and Haizhou Li
While most previous work on Wikification has focused on written texts, this paper presents a Wikification approach for spoken dialogues. A set of analyzers are proposed to learn dialogue-specific properties along with domain knowledge of conversations from Wikipedia. Then, the analyzed properties are used as constraints for generating candidates, and the candidates are ranked to find the appropriate links. The experimental results show that our proposed approach can significantly improve the performances of the task in human-human dialogues.
- Joint Lemmatization and Morphological Tagging with Lemming Thomas Müller, Ryan Cotterell, Alexander Fraser and Hinrich Schütze
We present Lemming, a modular log-linear model that jointly models lemmatization and tagging and supports the integration of arbitrary global features. Lemming sets the new state-of-the-art in token-based statistical lemmatization on six languages; e.g., for Czech lemmatization, we reduce the error by 60%, from 4.05 to 1.58.
- Transducer Disambiguation with Sparse Topological Features Gonzalo Iglesias, Adrià de Gispert and Bill Byrne
We describe a simple and efficient algorithm to disambiguate non-functional weighted finite state transducers (WFST), i.e. to generate a new WFST that contains a unique, best-scoring path for each hypothesis in the input labels along with the best output labels. The algorithm uses topological features with the use of a novel tropical sparse tuple vector semiring. We empirically prove that our algorithm is more efficient than previous work in a PoS-tagging disambiguation task. Also, we use our method to rescore very large translation lattices with a bilingual neural network language model, obtaining gains in line with the literature.
- Automatic Extraction of Time Expressions Accross Domains in French Narratives Mike Donald Tapi Nzali, Xavier Tannier and Aurelie Neveol
The prevalence of temporal references accross all types of natural language utterances makes temporal analysis a key issue in Natural Language Processing. This work adresses three research questions: 1/is temporal expression recognition specific to a particular domain? 2/if so, can we characterize domain specificity? and 3/how can sudomain specificity be integrated in a single tool for unified temporal expression extraction? Herein, we assess temporal expression recognition from documents written in French covering three domains. We present a new corpus of clinical narratives annotated for temporal expressions, and also use existing corpora in the newswire and historical domains. We show that temporal expressions can be extracted with high performance across domains (best F-measure 0.96 obtained with a CRF model on clinical narratives). We argue that domain adaptation for the extraction of temporal expressions can be done with limited efforts and should cover pre-processing as well as temporal specific tasks.
- ASTD: Arabic Sentiment Tweets Dataset Mahmoud Nabil, Mohamed Aly and Amir Atiya
This paper introduces ASTD, an Arabic social sentiment analysis dataset gathered from Twitter. It consists of about 10,000 tweets which are classified as objective, subjective positive, subjective negative, and subjective mixed. We present the properties and the statistics of the dataset, and run experiments using standard partitioning of the dataset. Our experiments provide benchmark results for 4 way sentiment classification on the dataset.
- Knowledge Base Inference using Bridging Entities Bhushan Kotnis, Pradeep Bansal and Partha P. Talukdar
Large scale Knowledge Bases (such as NELL, Yago, Freebase, etc) are often sparse, i.e., a large number of valid relations between existing entities are missing. Recent research have addressed this problem by augmenting the KB graph with additional edges mined from a large text corpus while keeping the set of nodes fixed, and then using the Path Ranking Algorithm (PRA) to perform KB inference over this augmented graph. In this paper, we extend this line of work by augmenting the KB graph not only with edges, but also with bridging entities, where both the edges and bridging entities are mined from a 500 million web text corpus. Through experiments on real-world datasets, we demonstrate the value of bridging entities in improving the performance and running times of PRA in the KB inference task. We plan to make our code and datasets publicly available upon publication of this paper.
- A quantitative analysis of gender differences in movies using psycholinguistic normatives Anil Ramakrishna, Nikolaos Malandrakis, Elizabeth Staruk and Shrikanth Narayanan
Direct content analysis reveals important details about movies including those of gender representations and potential biases. We investigate the differences between male and female character depictions in movies, based on patterns of language used. Specifically, we use an automatically generated lexicon of linguistic norms characterizing gender ladenness. We use multivariate analysis to investigate gender depictions and correlate them with elements of movie production. The proposed metric differentiates between male and female utterances and exhibits some interesting interactions with movie genres and the screenplay writer gender.
- Combing Discrete and Continuous Features for Deterministic Transition-based Dependency Parsing Meishan Zhang and Yue Zhang
We combine a traditional linear sparse feature model and a multi-layer neural network (NN) model for deterministic transition-based dependency parsing, by integrating the sparse features into the NN model. Correlations are drawn between the hybrid model and previous work on integrating word embedding features into a discrete linear model. By analyzing the results of various parsers on web-domain parsing, we show that the integrated model is a better way to combine traditional and embedding features compared with previous methods.
- Predicting the Structure of Cooking Recipes Jermsak Jermsurawong and Nizar Habash
Cooking recipes exist in abundance; but due to their unstructured text format, they are hard to study quantitatively beyond treating them as simple bags of words. In this paper, we propose an ingredient-instruction dependency tree data structure to represent recipes. The proposed representation allows for more refined comparison of recipes and recipe-parts, and is a step towards semantic representation of recipes. Furthermore, we build a parser that maps recipes into the proposed representation. The parser's edge prediction accuracy of 93.5% improves over a strong baseline of 85.7% (54.5% error reduction).
- Answering Elementary Science Questions by Constructing Coherent Scenes using Background Knowledge Yang Li and Peter Clark
Much of what we understand from text is not explicitly stated. Rather, the reader uses his/her knowledge to fill in gaps and create a coherent, mental picture or "scene" depicting what text appears to convey. The scene constitutes an understanding of the text, and can be used to answer questions that go beyond the text. Our goal is to answer elementary science questions, where this requirement is pervasive; A question will often give a partial description of a scene and ask the student about implicit information. We show that by using a simple "knowledge graph" representation of the question, we can leverage several large-scale linguistic resources to provide missing background knowledge, somewhat alleviating the knowledge bottleneck in previous approaches. The coherence of the best resulting scene, built from a question/answer-candidate pair, reflects the confidence that the answer candidate is correct, and thus can be used to answer multiple choice questions. Our experiments show that this approach significantly outperforms competitive algorithms on several datasets tested. The significance of this work is thus to show that a simple ``knowledge graph'' representation allows a version of "interpretation as scene construction" to be made viable.
- Evaluation of Word Vector Representations by Subspace Alignment Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Guillaume Lample and Chris Dyer
Unsupervisedly learned word vectors have proven to provide exceptionally effective features in many NLP tasks. However, there is currently no effective intrinsic evaluation that directly measures the semantic content of induced word vectors; instead, a collection of ad-hoc evaluations is used. We present QVEC--an intrinsic evaluation measure of the quality of word vector representations. The proposed measure obtains strong correlation with a battery of standard semantic evaluation tasks.
- Chinese Semantic Role Labeling with Bidirectional Recurrent Neural Networks Zhen Wang
Traditional approaches to Chinese Semantic Role Labeling (SRL) almost rely on feature engineering, which means their performances are highly dependent on a large number of handcrafted features. Even worse, the long-range dependencies in a sentence can hardly be modeled by these methods. In this paper, we introduce bidirectional recurrent neural network (RNN) with long-short-term memory (LSTM) to capture bidirectional and long-range dependencies in a sentence with minimal feature engineering. Experimental results on Chinese Proposition Bank (CPB) show a significant improvement over the state-of-the-art methods. Moreover, our model makes it convenient to introduce heterogeneous resource, which makes a further improvement to our experimental performance. Despite Chinese SRL being a specific case, our approach can be easily generalized to SRL in other languages.
- Personalized Machine Translation: Predicting Translational Preferences Shachar Mirkin and Jean-Luc Meunier
Machine Translation (MT) has advanced in recent years to deliver better translations for clients' specific domains, and to provide sophisticated tools that learn from translators' corrections. We suggest that MT could be further personalized to the end-user level -- the specific reader of the text or its author. As a step in that direction, we propose a method based on a recommender systems approach where the user's preferred translation system is predicted based on preferences of similar users. In our experiments, this method outperforms a set of non-personalized methods for multiple language-pairs.
- LDTM: A Latent Document Type Model for Cumulative Citation Recommendation Jingang Wang, Dandan Song, Ning Zhang, Zhiwei Zhang, Lejian Liao and Luo Si
This paper studies Cumulative Citation Recommendation (CCR) - given an entity in Knowledge Bases, how to effectively detect its potential citations from volume text streams. Most previous approaches treated all kinds of features indifferently to build a global relevance model, in which the prior knowledge embedded in documents cannot be exploited adequately. To address this problem, we propose a latent document type discriminative model by introducing a latent layer to capture the correlations between documents and their latent types. The model can better adjust to different types of documents and yield flexible performance when dealing with a broad range of document types. An extensive set of experiments has been conducted on TREC-KBA-2013 dataset, and the results demonstrate that our model can yield a significant performance gain on recommendation quality as compared to the state-of-the-art.
- Arabic Diacritization with Recurrent Neural Networks Yonatan Belinkov and Jim Glass
Arabic, Hebrew, and similar languages are typically written without diacritics, leading to ambiguity and posing a major challenge for core language processing tasks like speech recognition. Previous approaches to automatic diacritization employed a variety of machine learning techniques. However, they typically rely on existing tools like morphological analyzers and therefore cannot be easily extended to new genres and languages. We develop a recurrent neural network with long short-term memory layers for predicting diacritics in Arabic text. Our language-independent approach is trained solely from diacritized text without relying on external tools. We show experimentally that our model can rival state-of-the-art methods that have access to additional resources.
- Efficient Inner-to-outer Greedy Algorithm for Higher-order Labeled Dependency Parsing Xuezhe Ma and Eduard Hovy
Many NLP systems use dependency parsers as critical components. Jonit learning parsers usually achieve better parsing accuracies than two-stage methods. However, classical joint parsing algorithms significantly increase computational complexity, which makes joint learning impractical. In this paper, we proposed an efficient dependency parsing algorithm that is capable of capturing multiple edge-label features, while maintaining low computational complexity. We evaluate our parser on 14 different languages. Our parser consistently obtains more accurate results than three baseline systems and three popular, off-the-shelf parsers.
- Semi-Supervised Bootstrapping of Relationship Extractors with Distributional Semantics David Batista, Bruno Martins and Mário Silva
Semi-supervised bootstrapping techniques for relationship extraction from text iteratively expand a set of initial seed relationships while limiting the semantic drift, i.e. the progressive deviation of the semantics of extracted relationships from the semantics of the seed relationships. We research bootstrapping for relationship extraction using word embeddings to find similar relationships. The obtained results show that relying on word embeddings achieves a better performance on the task of extracting four types of relationships from a collection of documents when compared with a baseline using TF-IDF to find similar relationships.
- Turn-taking phenomena in incremental dialogue systems Hatim Khouzaimi, Romain Laroche and Fabrice Lefevre
In this paper, a turn-taking phenomenon taxonomy is introduced, organised according to the level of information conveyed. It is aimed to provide a better grasp of the behaviours used by humans while talking to each other, so that they can be methodically replicated in dialogue systems. Five interesting phenomena have been implemented in a simulated environment: the system barge-in because of an unclear, an incoherent or a sufficient message, the feedback and the user barge-in. The aim of the experiment is to illustrate that some phenomena are worth implementing in some cases and others are not.
- Cross-document Event Coreference Resolution based on Cross-media Features Tongtao Zhang, Hongzhi Li, Heng Ji and Shih-Fu Chang
In this paper we focus on a new problem of event coreference resolution across television news videos. Based on the observation that the contents from multiple data modalities are complementary, we develop a novel approach to jointly encode effective features from both closed captions and video key frames. Experiment results demonstrate that visual features provided 7.2% absolute F-score gain on state-of-the-art text based event extraction and coreference resolution.
- Event Detection and Factuality Assessment with Non-Expert Supervision Kenton Lee, Yoav Artzi, Yejin Choi and Luke Zettlemoyer
Events are communicated in natural language with varying degrees of certainty. For example, if you are "hoping for a raise," it may be somewhat less likely than if you are "expecting" one. To study these distinctions, we present scalable, high-quality annotation schemes for event detection and fine-grained factuality assessment. We find that non-experts, with very little training, can reliably provide judgments about what events are mentioned and the extent to which the author thinks they actually happened. We also show how such data enables the development of regression models for fine-grained scalar factuality predictions that outperform strong baselines.
- Online Sentence Novelty Scoring for Topical Document Streams Sungjin Lee
The enormous amount of information on the Internet has raised the challenge of highlighting new information in the context of already viewed content. This type of intelligent interface can save users time and prevent frustration. Our goal is to scale out novelty detection to large web properties like Google and Yahoo News. We present a set of efficient, light-weight features for online novelty scoring and a fast nonlinear feature transformation method using Deep Neural Network. Our experimental results on the TREC 2004 datasets show that the proposed method is not only efficient but also very powerful, significantly surpassing the best challenge system at TREC 2004.
- That's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets William Yang Wang and Diyi Yang
We propose a novel data augmentation approach to enhance computational behavioral analysis using social media text. In particular, we collect a Twitter corpus of the descriptions of annoying behaviors using the #petpeeve hashtags. In the qualitative analysis, we study the language use in these tweets, with a special focus on the fine-grained categories and the geographic variation of the language. In quantitative analysis, we show that lexical and syntactic features are useful for automatic categorization of annoying behaviors, and frame-semantic features further boost the performance; that leveraging large lexical embedding to create additional training instances significantly improves the lexical model; and incorporating frame-semantic embedding achieves the best overall performance.
- Touch-Based Pre-Post-Editing of Machine Translation Output Benjamin Marie and Aurélien Max
We introduce pre-post-editing, possibly the most basic form of interactive translation, as a touch-based interaction with iteratively improved translation hypotheses prior to classical post-editing. We report simulated experiments that yield very large improvements on classical evaluation metrics (up to 21 BLEU) as well as on a parameterized variant of the TER metric that takes into account the cost of matching/touching tokens, confirming the promising prospects of the novel translation scenarios offered by our approach.
- Going Beyond Lexical Similarity with Word Embeddings For ROUGE Jun-Ping Ng
ROUGE is a widely adopted, automatic evaluation measure for text summarization. While it has been shown to correlate well with human judgements, it is biased towards surface lexical similarities. This makes it unsuitable for the evaluation of abstractive summarization, or summaries with substantial paraphrasing. We study the effectiveness of word embeddings to overcome this disadvantage of ROUGE. Specifically, instead of measuring lexical overlaps, word embeddings are used to compute the semantic similarity of the words used in summaries instead. Our experimental results show that our proposal is able to achieve better correlations with human judgements when measured with the Spearman and Kendall rank coefficients.
- Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin, Tobias Schnabel and Hinrich Schütze
We propose online unsupervised domain adaptation (DA), which is performed incrementally as data comes in and is applicable when batch DA is not possible. In a part-of-speech (POS) tagging evaluation, we find that online unsupervised DA performs as well as batch DA.
- Detection of Steganographic Users on Twitter Phil Blunsom, Andrew Ker and Alex Wilson
We propose a method to detect hidden data in English text. We target a system previously thought secure, which hides messages in tweets. The method brings ideas from image steganalysis into the linguistic domain, including the training of a feature-rich model for detection. To identify Twitter users guilty of steganography, we aggregate evidence; a first, in any domain. We test our system on a set of 1M steganographic tweets, and show it to be effective.
- An Automatic Diacritics Restoration for Hungarian Attila Novák and Borbála Siklósi
In this paper, we describe a method based on statistical machine translation (SMT) that is able to restore accents in Hungarian texts with high accuracy. Due to the agglutinating characteristic of Hungarian, there are always wordforms unknown to a system trained on a fixed vocabulary. In order to be able to handle such words, we integrated a morphological analyzer into the system that can suggest accented word candidates for unknown words. We evaluated the system in different setups, achieving an accuracy above 99% at the highest.
- #SupportTheCause: Identifying Motivations to Participate in Online Health Campaigns Dong Nguyen, Tijs A. van den Broek, Claudia Hauff and Djoerd Hiemstra
We automatically identify participants' motivations in the public health campaign Movember and investigate the impact of the different motivations on the amount of campaign donations raised. Our classification scheme is based on the Social Identity Model of Collective Action (van Zomeren et al., 2008). We find that automatic classification based on Movember profiles is fairly accurate, while automatic classification based on tweets is challenging. Using our classifier, we find a strong relation between motivations and donations. Our study is a first step towards scaling-up collective action research.
- Extraction and generalisation of variables from scientific publications Erwin Marsi and Pinar Öztürk
Text mining from Earth science literature is significantly different from biomedical text mining and therefore requires distinct approaches and methods. Our approach aims at automatically locating and extracting variables and their direction of variation: increasing, decreasing or just changing. Variables are initially extracted by matching tree patterns onto the syntax trees of the source texts. Next, variables are generalised in order to enhance their similarity, facilitating hierarchical search and inference. This generalisation is accomplished by progressive pruning of syntax trees using a set of tree transformation operations. Text mining results are presented as a browsable variable hierarchy which allows users to inspect all mentions of a particular variable type in the text as well as any generalisations or specialisations. The approach is demonstrated on a corpus of 10k abstracts of Nature publications. We discuss experiences with this early prototype and outline a number of possible improvements and directions for future research.
- Empty Category Detection using Path Features and Distributed Case Frames Shunsuke Takeno, Masaaki Nagata and Kazuhide Yamamoto
We describe an approach for machine learning-based empty category detection on the phrase structure analysis of Japanese. The problem are formalized as tree node classification, we find that conjunction between path features and other node component features, namely, head word feature, child feature and empty category feature, are highly effective. We also find that a set of dot products between the word embeddings for a verb and those for case particles can be used as a substitution for case frames. The experiment results showed that the proposed method outperformed the previous state of the art from 68.6% to 73.2% in F-measure.
- Learning Timeline Difference for Text Categorization Fumiyo Fukumoto and Yoshimi Suzuki
This paper addresses text categorization problem that training data may derive from a different time period from the test data. We present a learning framework which extends a boosting technique to learn accurate model for timeline adaptation. The results showed that the method was comparable to the current state-of-the-art biased-SVM method, especially the method is effective when the creation time period of the test data differs greatly from the training data.
- Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi and Akinori Ito
This paper focuses on language modeling with adequate robustness to support different domain tasks. To this end, we propose a hierarchical latent word language model (h-LWLM). The proposed model can be regarded as a generalized form of the standard LWLMs. The key advance is introducing a multiple latent variable space with hierarchical structure. The structure can flexibly take account of linguistic phenomena not present in the training data. This paper details the definition as well as a training method based on layer-wise inference and a practical usage in natural language processing tasks with an approximation technique. Experiments on speech recognition show the effectiveness of h-LWLM in out-of domain tasks.
- Adjective Intensity and Sentiment Analysis Raksha Sharma, Astha Agarwal, Mohit Gupta and Pushpak Bhattacharyya
For fine-grained sentiment analysis, we need to go beyond zero-one polarity and find a way to compare adjectives that share a common semantic property. In this paper, we present a semi-supervised approach to assign intensity levels to adjectives, viz. high, medium and low, where adjectives are compared when they belong to the same semantic category. For example, in the semantic category of EXPERTISE, 'expert', 'experienced' and 'familiar' are respectively of level high, medium and low. We obtain an overall accuracy of 77% for intensity assignment. We show the significance of considering intensity information of adjectives in predicting star-rating of reviews. Our intensity based prediction system results in an accuracy of 59% for 5-star rated movie review corpus.
- Non-lexical neural architecture for fine-grained POS Tagging Matthieu Labeau, Alexander Allauzen and Kevin Löser
In this paper we explore neural architectures that can infer word representations from the raw character stream. They rely on two modelling stages that are jointly learnt: a convolutional network that infers a word representation directly from the character stream, followed by a prediction stage. Models are evaluated on a POS and morphological tagging task for German. Experimental results show that the convolutional network can infer meaningful word representations, while for the prediction stage, a well designed and structured strategy allows the model to outperform state-of-the-art results, without any feature engineering.
- Sentence Modeling with Gated Recursive Neural Network Xinchi Chen, Xipeng Qiu and Xuanjing Huang
Recently, neural network based sentence modeling methods have achieved great progress. Among these methods, the recursive neural networks (RecNNs) can effectively model the combination of the words in sentence. However, RecNNs need a given external topological structure, like syntactic tree. In this paper, we propose a gated recursive neural network (GRNN) to model sentences, which employs a full binary tree (FBT) structure to control the combinations in recursive structure. By introducing two kinds of gates, our model can better model the complicated combinations of features. Experiments on three text classification datasets show the effectiveness of our model.
- The Rating Game: Sentiment Rating Reproducibility from Text Lasse Borgholt, Peter Simonsen and Dirk Hovy
We investigate (i) whether human annotators can infer ratings from IMDb movie reviews, (ii) how human performance compares to a regression model, and (iii) whether model performance is affected by the rating "source" (i.e. author vs. annotator ratings). We collect a data set of IMDb movie reviews with author-provided ratings, and have it re-annotated by crowdsource and expert annotators. Annotators reproduce the original ratings better than a linear regression model, but are off by a large margin in more than 5% of the cases. Models trained on annotator-labeled data outperform those trained on author-labeled data, questioning the usefulness of author-rated reviews serving as labeled data for sentiment analysis.
- Named entity recognition with document-specific KB tag gazetteers Will Radford, Xavier Carreras and James Henderson
We consider a novel setting for Named Entity Recognition (NER) where we have access to document-specific knowledge base tags. These tags consist of a canonical name from a knowledge base (KB) and entity type, but are not aligned to the text. We explore how to use KB tags to create document-specific gazetteers at inference time to improve NER. We find that this kind of supervision helps recognise organisations more than standard wide-coverage gazetteers. Moreover, augmenting document-specific gazetteers with KB information lets users specify fewer tags for the same performance, reducing cost.
- A Multi-lingual Annotated Dataset for Aspect-Oriented Opinion Mining Salud M. Jiménez-Zafra, Giacomo Berardi, Andrea Esuli, Diego Marcheggiani, Maite Martin and Alejandro Moreo Fernández
We present the Trip-MAML dataset, a Multi-Lingual dataset of hotel reviews that have been manually annotated at the sentence-level with Multi-Aspect sentiment labels. This dataset has been built as an extension of a publicly available dataset which is mono-lingual (English) adding documents written in Italian and Spanish. We detail the dataset construction process: the data gathering, selection, and annotation. We present inter-annotator agreement figures and baseline experimental results, comparing the three languages. Trip-MAML is the first multi-lingual dataset for aspect-oriented opinion mining, enabling thus (i) to face the problem on languages different from English and (ii) to apply cross-lingual learning methods to the task.
- What Your Username Says About You Aaron Jaech and Mari Ostendorf
Usernames are ubiquitous on the Internet, and they are often suggestive of user demographics. This work looks at the degree to which gender and language can be inferred from a username alone by making use of unsupervised morphology induction to decompose usernames into sub-units. Experimental results on the two tasks demonstrate the effectiveness of the proposed morphological features compared to a character n-gram baseline.
- Summarizing Topical Contents from PubMed Documents Using a Thematic Analysis Sun Kim, Lana Yeganova and W. John Wilbur
Improving the search and browsing experience in PubMed is a key component in helping users detect information of interest. In particular, when exploring a novel field, it is important to provide a comprehensive view for a specific subject. One solution for providing this panoramic picture is to find sub-topics from a set of documents. We propose a method that finds sub-topics that we refer to as themes and computes representative titles based on a set of documents in each theme. The method combines a thematic clustering algorithm and the Pool Adjacent Violators algorithm to induce significant themes. Then, for each theme, a title is computed using PubMed document titles and theme-dependent term scores. We tested our system on five disease sets from OMIM and evaluated the results based on normalized point-wise mutual information and MeSH terms. For both performance measures, the proposed approach outperformed LDA. The quality of theme titles were also evaluated by comparing them with manually created titles.
- A Model of Zero-Shot Learning of Spoken Language Understanding Majid Yazdani and James Henderson
When building spoken dialogue systems for a new domain, a major bottleneck is developing a spoken language understanding (SLU) module that handles the new domain's terminology and semantic concepts. We propose a statistical SLU model that generalises to both previously unseen input words and previously unseen output classes by leveraging unlabeled data. After mapping the utterance into a vector space, the model exploits the structure of the output labels by mapping each label to a hyperplane that separates utterances with and without that label. Both these mappings are initialised with unsupervised word embeddings, so they can be computed even for words or concepts which were not in the SLU training data.
- Extractive Summarization by Maximizing Semantic Volume Dani Yogatama, Fei Liu and Noah A. Smith
The most successful approaches to extractive text summarization seek to maximize bigram coverage subject to a budget constraint. In this work, we propose instead to maximize semantic volume. We embed each sentence in a semantic space and construct a summary by choosing a subset of sentences whose convex hull maximizes volume in that space. We provide a greedy algorithm based on the Gram-Schmidt process to efficiently perform volume maximization. Our method outperforms the state-of-the-art summarization approaches on benchmark datasets.
- An Analysis of Domestic Abuse Discourse on Reddit Nicolas Schrading, Cecilia Ovesdotter Alm, Ray Ptucha and Christopher Homan
Domestic abuse affects people of every race, class, age, and nation. There is significant research on the prevalence and effects of domestic abuse; however, such research typically involves large-scale population-based surveys that have high financial costs. This work provides a qualitative analysis of domestic abuse using data collected from the social and news-aggregation website reddit.com. We develop a classifier to detect submissions discussing domestic abuse, achieving accuracies of 94%, a substantial error reduction over its baseline. Analysis of the top features used in detecting abuse discourse provides insight into the dynamics of abusive relationships.
- Twitter-scale New Event Detection via K-term Hashing Dominik Wurzer, Victor Lavrenko and Miles Osborne
First Story Detection is hard because the most accurate systems become progressively slower with each document processed. We present a novel approach to FSD, which operates in constant time/space and scales to very high volume streams. We show that when computing novelty over a large dataset of tweets, our method performs 192 times faster than a state-of-the-art baseline without sacrificing accuracy. Our method is capable of performing FSD on the full Twitter stream on a single core of modest hardware.
- Learn to Solve AlgebraWord Problems Using Quadratic Programming Lipu Zhou, Shuaixiang Dai and Liwei Chen
This paper presents a new algorithm to automatically solve algebra word problems. Our algorithm solves a word problem via analyzing a hypothesis space containing all possible equation systems generated by assigning the numbers in the word problem into a set of equation system templates extracted from the training data. To obtain a robust decision surface, we train a log-linear model to make the margin between the correct assignments and the false ones as large as possible. This results in a quadratic programming (QP) problem which can be efficiently solved. Experimental results show that our algorithm achieves 78.38% accuracy, about 10% higher than the state-of-the-art baseline (Kushman et al., 2014).
- Discourse Planning with an N-gram Model of Relations Or Biran and Kathleen McKeown
While it has been established that transitions between discourse relations are important for coherence, such information has not so far been used to aid in language generation. We introduce an approach to discourse planning for concept-to-text generation systems which simultaneously determines the order of messages and the discourse relations between them. This approach makes it straightforward to use statistical transition models, such as n-gram models of discourse relations learned from an annotated corpus. We show that using such a model significantly improves the quality of the generated text as judged by humans.
- "A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce" : Learning State Changing Verbs from Wikipedia Revision History Derry Tanti Wijaya, Ndapandula Nakashole and Tom Mitchell
Learning to determine when the facts of a Knowledge Base (KB) have to be updated is a challenging task. We propose to learn state changing verbs from Wikipedia edit history. When a state-changing event, such as a marriage or death, happens to an entity, the infobox on the entity's Wikipedia page usually gets updated. At the same time, the article text may be updated with verbs either being added or deleted to reflect the changes made to the infobox. We use Wikipedia edit history to distantly supervise a method for automatically learning verbs and state changes. Additionally, our method uses constraints to effectively map verbs to infobox changes. We observe in our experiments that when state-changing verbs are added or deleted from an entity's Wikipedia page text, we can update the entity's infobox updates with 88% precision and 76% recall. One compelling application of our verbs is to incorporate them as triggers in methods for updating existing KBs, which are currently mostly static.
- Measuring Prerequisite Relations Among Concepts Chen Liang, Zhaohui Wu, Wenyi Huang and C. Lee Giles
A prerequisite relation describes a basic relation among concepts in cognition, education and other areas. However, as a semantic relation, it has not been well studied in computational linguistics. We study the problem of measuring prerequisite relations among concepts and propose a simple metric that effectively models the relation by measuring how much different two concepts refer to each other. Evaluations on two datasets that include seven domains show that our single metric based method outperforms existing supervised learning based methods.
- Classifying Tweet Level Judgements of Rumours in Social Media Michal Lukasik, Trevor Cohn and Kalina Bontcheva
Social media is a rich source of rumours and corresponding community reactions. Rumours reflect different characteristics, some shared and some individual. We formulate the problem of classifying the tweet level judgements of rumours in social media as supervised learning using annotated examples. Both supervised and unsupervised domain adaptation are considered, in which tweets from a rumour are classified on the basis of other annotated rumours. We demonstrate how multi-task learning helps achieve good results on rumours from the 2011 England riots.
- Improving Distant Supervision for Information Extraction Using Label Propagation Through Lists Lidong Bing, Sneha Chaudhari, Richard Wang and William Cohen
Because of polysemy, distant labeling for information extraction leads to noisy training data. We describe a procedure for reducing this noise by using label propagation on a graph in which the nodes are entity mentions, and mentions are coupled when they occur in coordinate list structures. We show that this labeling approach leads to good performance even when off-the-shelf classifiers are used on the distantly-labeled data.
- An Unsupervised Method for Discovering Lexical Variations in Roman Urdu Informal Text Abdul Rafae, Abdul Qayyum, Muhammad Moeenuddin, Asim Karim, Hassan Sajjad and Faisal Kamiran
We present an unsupervised method to find lexical variations in Roman Urdu informal text. Our method includes a phonetic algorithm UrduPhone, a featurebased similarity function, and a clustering algorithm Lex-C. UrduPhone encodes roman Urdu strings to their phonetic equivalent representations. This produces an initial grouping of different spelling variations of a word. The similarity function incorporates word features and their context. Lex-C is a variant of k-medoids clustering algorithm that group lexical variations. It incorporates a similarity threshold to balance the number of clusters and their maximum similarity. We test our system on two datasets of SMS and blogs and show an f-measure gain of up to 12% from baseline systems. With this paper, we make the datasets and the code available to the research community for further work on roman Urdu understanding.
- Summarization Based on Embedding Distributions Hayato Kobayashi, Masaki Noguchi and Taichi Yatsuka
In this study, we consider a summarization method using the document level similarity based on embeddings, or distributed representations of words, where we assume that an embedding of each word can represent its ``meaning.'' We formalize our task as the problem of maximizing a submodular function defined by the negative summation of the nearest neighbors' distances on embedding distributions, each of which represents a set of word embeddings in a document. We proved the submodularity of our objective function and that our problem is asymptotically related to the KL-divergence between the probability density functions that correspond to a document and its summary in a continuous space. An experiment using a real dataset demonstrated that our method performed better than the existing method based on the sentence-level similarity.
- A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation Gaurav Kumar, Graeme Blackwood, Jan Trmal, Daniel Povey and Sanjeev Khudanpur
Speech translation is conventionally carried out by cascading an automatic speech recognition (ASR) and a statistical machine translation (SMT) system. The hypotheses chosen for translation are based on the ASR system's acoustic and language model scores, and typically optimized for word error rate, ignoring the intended downstream use: automatic translation. In this paper, we present a coarse-to-fine model that uses features from the ASR and SMT systems to optimize this coupling. We demonstrate that several standard features utilized by ASR and SMT systems can be used in such a model at the speech-translation interface, and we provide empirical results on the Fisher Spanish-English speech translation corpus.
- Any-language frame semantic parsing Anders Johannsen, Héctor Martínez Alonso and Anders Søgaard
We present a multilingual corpus of Wikipedia and Twitter texts annotated with FRAMENET 1.5 semantic frames in 9 different languages, as well as a novel technique for weakly supervised crosslingual frame semantic parsing. Our approach only assumes the existence of linked, comparable source and target language corpora (e.g., Wikipedia) and a bilingual dictionary (e.g., Wiktionary or BABELNET). Our approach uses a truly interlingual representation enabling us to use the same model across all 9 languages. We present average error reductions over running a state-of-the-art parser on word-to-word translations of 46% for target identification, 37% for frame identification, and 14% for argument identification.
- Variable-Length Word Encodings for Neural Translation Models Rohan Chitnis and John DeNero
Recent work in neural machine translation has shown promising performance, but the most effective architectures do not scale naturally to large vocabulary sizes. We propose and compare three variable-length encoding schemes that represent a large vocabulary corpus using a small vocabulary with no loss in information. Common words are unaffected by our encoding, but rare words are encoded using a sequence of two pseudo-words. Our method is simple and effective: it requires no complete dictionaries, learning procedures, increased training time, changes to the model, or new parameters. Compared to a baseline that replaces all rare words with an unknown word symbol, our best variable-length encoding strategy improves WMT English-French translation performance by 1.7 BLEU.
- Modeling Tweet Arrival Times using Log-Gaussian Cox Processes Michal Lukasik, Srijith Prabhakaran Nair Kusumam, Trevor Cohn and Kalina Bontcheva
Research on modeling time series text corpora has typically focused on predicting what text will come next, but less well studied is predicting when the next text event will occur. In this paper we address the latter case, framed as modeling continuous inter-arrival times under a log-Gaussian Cox process, a form of inhomogeneous Poisson process which captures the varying rate at which the tweets arrive over time. In an application to rumour modeling of tweets surrounding the 2014 Ferguson riots, we show how inter-arrival times between tweets can be accurately predicted, and that incorporating textual features further improves predictions.
- Foreebank: Syntactic Analysis of Customer Support Forums Rasoul Kaljahi, Jennifer Foster, Johann Roturier, Corentin Ribeyre, Teresa Lynn and Joseph Le Roux
We present a new treebank of English and French technical forum content which has been annotated for grammatical errors and phrase structure. This double annotation allows us to empirically measure the effect of errors on parsing performance. While it is slightly easier to parse the corrected versions of the forum sentences, it is not the main factor in making this kind of text hard to parse.
- Recognizing Biographical Sections in Wikipedia Alessio Palmero Aprosio and Sara Tonelli
Wikipedia is the largest collection of encyclopedic data ever written in the history of humanity. Thanks to its coverage and its availability in machine-readable format, it has become a primary resource for large-scale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections from them: given a person's page, we identify the list of sections where information about her/his life is present. We model this as a sequence classification problem, and propose a supervised setting, in which the training data are acquired automatically. Besides, we show that six simple features extracted only from the section titles are very informative and yield good results well above a strong baseline.
- Pre-Computable Multi-Layer Neural Network Language Models Jacob Devlin
In the last several years, neural network models have significantly improved accuracy in a number of NLP tasks. However, one serious drawback that has impeded their adoption in production systems is the slow runtime speed of neural network models compared to alternate models, such as maximum entropy classifiers. In Devlin 2014, the authors presented a simple technique for speeding up feed-forward embedding-based neural network models, where the dot product between each word embedding and part of the first hidden layer are pre-computed offline. However, this technique cannot be used for hidden layers beyond the first. In this paper, we explore a neural network architecture where the embedding layer feeds into multiple hidden layers that are placed ``next to'' one another so that each can be pre-computed independently. On a large scale language modeling task, this lateral architecture achieves a 10x speedup at runtime and a significant reduction in perplexity when compared to a standard multi-layer network.
- Global Thread-level Inference for Comment Classification in Community Question Answering Shafiq Joty, Alberto Barrón-Cedeño, Giovanni Da San Martino, Alessandro Moschitti, Lluís Màrquez and Preslav Nakov
Community question answering, a recent evolution of question answering in the Web context, allows a user to quickly consult the opinion of a number of people on a particular topic, thus taking advantage of the wisdom of the crowd. Here we try to help the user by deciding automatically which answers are good and which are bad for a given question. In particular, we focus on exploiting the output structure at the thread level in order to make more consistent global decisions. More specifically, we exploit the relations between pairs of comments at any distance in the thread, which we incorporate in a graph-cut and in an ILP frameworks. The evaluation on the benchmark dataset of SemEval-2015 Task 3 confirms the importance of using thread-level information, which allows us to improve over the state of the art.
- Reversibility reconsidered : finite-state factors for efficient probabilistic sampling in parsing and generation Marc Dymetman, Sriram Venkatapathy and Chunyang Xiao
We revisit the classical logical notion of generation/parsing reversibility in terms of feasible probabilistic sampling, and argue for an implementation based on finite-state factors. We propose a modular decomposition that reconciles generation accuracy with parsing robustness and allows the introduction of dynamic contextual factors. (Opinion Piece)
- Key Concept Identification for Medical Information Retrieval Jiaping Zheng and Hong Yu
The difficult language in Electronic Health Records (EHRs) presents a challenge to patients' understanding of their own conditions. One approach to lowering the barrier is to provide tailored patient education based on their own EHR notes. We are developing a system to retrieve EHR note-tailored online consumer oriented health education materials. We explored topic model and key concept identification methods to construct queries from the EHR notes. Our experiments show that queries using identified key concepts with pseudo-relevance feedback significantly outperform (over 10-fold improvement) the baseline system of using the full text note.
- An Entity-centric Approach for Overcoming Knowledge Graph Sparsity Manjunath Hegde and Partha P. Talukdar
Automatic construction of knowledge graphs (KGs) from unstructured text has received considerable attention in recent research, resulting in the construction of several KGs with millions of entities (nodes) and facts (edges) among them. Unfortunately, such KGs tend to be severely sparse in terms of number of facts known for a given entity, i.e., have low knowledge density. For example, the NELL KG consists of only 1.34 facts per entity. Unfortunately, such log knowledge density makes it challenging to use such KGs in real-world applications. In contrast to best-effort extraction paradigms followed in the construction of such KGs, in this paper we argue in favor of ENTIty Centric Expansion (ENTICE), an entity-centric KG population framework, to alleviate the low knowledge density problem in existing KGs. By using ENTICE, we are able to increase NELL's knowledge density by a factor of 7.7 at 75.5% accuracy. Additionally, we are also able to extend the ontology discovering new relations and entities. We hope to make all datasets and code publicly available upon publication of the paper.
- Translation Invariant Word Embeddings Kejun Huang, Matt Gardner, Evangelos Papalexakis, Christos Faloutsos, Nikos Sidiropoulos and Tom Mitchell
This work focuses the task of finding latent vector representations of the words in a corpus. In particular, we address the issue of what to do when there are multiple languages in the corpus. Prior work has, among other techniques, used canonical correlation analysis to project pre-trained vectors in two languages into a common space. We propose a simple and scalable method that is inspired by the notion that the learned vector representations should be invariant to translation between languages. We show empirically that our method outperforms prior work on multilingual tasks, matches the performance of prior work on monolingual tasks, and scales linearly with the size of the input data (and thus the number of languages being embedded).
- Efficient Hyper-parameter Optimization for NLP Applications Lidan Wang, Minwei Feng, Bowen Zhou, Bing Xiang and Sridhar Mahadevan
Hyper-parameter optimization is an important problem in natural language processing (NLP) and machine learning. Recently, a group of studies has focused on using sequential Bayesian Optimization to solve this problem, which aims to reduce the number of iterations and trials required during the optimization process. In this paper, we explore this problem from a different angle, and propose a multi-stage hyper-parameter optimization that breaks the problem into multiple stages with increasingly amounts of data. Early stage provides fast estimates of good candidates which are used to initialize later stages for better performance and speed. We demonstrate the utility of this new algorithm by evaluating its speed and accuracy against state-of-the-art Bayesian Optimization algorithms on classification and prediction tasks.
- WikiQA: A Challenge Dataset for Open-Domain Question Answering Yi Yang, Wen-tau Yih and Christopher Meek
We describe the WikiQA dataset, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. Most previous work focuses on the TREC-QA dataset, which includes editor generated questions and candidate answer sentences selected by matching content words in the question. WikiQA is constructed using a more natural process and is more than an order of magnitude larger the TREC-QA dataset. Unlike the TREC-QA dataset, the WikiQA dataset also includes questions for which there are no correct sentences, enabling researchers to work on answer triggering, a critical component in any QA system. We also propose new metrics to evaluate performance on the task of answer triggering. We compare several systems on the task of answer sentence selection on both datasets and also describe the performance of a system on the problem of answer triggering using the WikiQA dataset.
- Script Induction as Language Modeling Rachel Rudinger, Pushpendre Rastogi, Francis Ferraro and Benjamin Van Durme
The Narrative Cloze is an evaluation metric commonly used for work on automatic script induction. While prior work in this area has focused on count-based methods from distributional semantics, such as pointwise mutual information, we argue that the Narrative Cloze can be productively reframed as a language modeling task. By training a discriminative language model for this task, we attain improvements of up to 27 percent over prior methods on standard Narrative Cloze metrics.
- Joint Event Trigger Identification and Event Coreference Resolution with Structured Perceptron Jun Araki and Teruko Mitamura
Events and their coreference offer useful semantic and discourse resources. We show that the semantic and discourse aspects of events interact with each other. However, traditional approaches addressed event extraction and event coreference resolution either separately or sequentially, which limits their interactions. This paper proposes a document-level structured learning model that jointly identifies event triggers and resolves event coreference. We demonstrate that the joint model outperforms a pipelined model by 6.9 BLANC F1 and 1.8 CoNLL F1 points in event coreference resolution using a corpus in the biology domain.
- Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis Soujanya Poria, Alexander Gelbukh and Erik Cambria
We present a novel way of extracting features from short texts, based on the activation values of an inner layer of a deep Convolutional Neural Network. We use the extracted features in multimodal sentiment analysis of very short video clips, each clip presenting a person uttering one sentence. We use the combined feature vectors of textual, visual, and audio modalities to train a classifier based on Multiple Kernel Learning, which is known to be good at heterogeneous data. We obtain 56% lower error rate than the state of the art. We also present a parallelizable decision-level data fusion method, which is much faster, though slightly less accurate.
- Syntactic Parse Fusion Do Kook Choe, David McClosky and Eugene Charniak
Model combination techniques have consistently shown state-of-the-art performance across multiple tasks, including syntactic parsing. However, they dramatically increase runtime and can be difficult to employ in practice. We demonstrate that applying constituency model combination techniques to n-best lists instead of n different parsers results in significant parsing accuracy improvements. Parses are weighted by their probabilities and combined using an adapted version of Sagae and Lavie (2006). These accuracy gains come with marginal computational costs and are obtained on top of existing parsing techniques such as discriminative reranking and self-training, resulting in state-of-the-art accuracy: 92.6% on WSJ section 23. On out-of-domain corpora, accuracy is improved by 0.4% on average. We empirically confirm that six well-known n-best parsers benefit from the proposed methods across six domains.
- Latent Variable Regression for Text Similarity and Textual Entailment John Wieting and Dan Roth
We present a latent alignment algorithm that gives state-of-the-art results on the Textual Entailment and nearly state-of-the-art results on the Semantic Textual Similarity (STS) tasks of the SICK dataset (Marelli et al., 2014). Our model accomplishes this despite using at most two feature templates: word conjunctions and a single word similarity metric. Furthermore, since our model has a small feature space, we achieve performance competitive with reported results in the literature after training on only 500 examples. Our model is a very strong baseline for paraphrase detection, textual entailment, and text similarity tasks, with significant potential for further improvement.
- SLSA: A Sentiment Lexicon for Standard Arabic Ramy Eskander and Owen Rambow
Sentiment analysis has been a major area of interest, for which the existence of high-quality resources is crucial. In Arabic, there is a reasonable number of sentiment lexicons but with major deficiencies. The paper presents a large-scale Standard Arabic Sentiment Lexicon (SLSA) that is publicly available for free and avoids the deficiencies in the current resources. SLSA has the highest up-to-date reported coverage. The construction of SLSA is based on linking AraMorph with SentiWordNet along with a few heuristics and powerful back-off. SLSA has a relative improvement of 37.8% over a state-of-the-art lexicon when tested for accuracy. It also outperforms it by an absolute 3.5% of F1-score when tested for sentiment analysis.
- Online Learning of Interpretable Word Embeddings Hongyin Luo, Zhiyuan Liu and Maosong Sun
Word embeddings encode semantic meanings of words into low-dimension word vectors. In most word embeddings, one cannot interpret the meanings of specific dimensions of those word vectors. Non-negative matrix factorization (NMF) has been proposed to learn interpretable word embeddings via non-negative constraints. However, NMF methods suffer from scale and memory issue because they have to maintain a global matrix for learning. To alleviate this challenge, we propose online learning of interpretable word embeddings from streaming text data. Experiments show that our model consistently outperforms the state-of-the-art word embedding methods in both representation ability and interpretability.
- Aligning Knowledge and Text Embeddings by Entity Descriptions Huaping Zhong and Jianwen Zhang
We study the problem of jointly embedding knowledge bases and text corpus. The key issue is the alignment model to make sure the vectors of entities, relations and words are in the same space. Previous method of Wang et al. (2014a) relies on Wikipedia anchors, which limits the applicable scope. We propose a novel alignment model based on text descriptions of entities, without dependency on anchors. Specifically, we require the embedding for an entity not only approximates the structured constraints in KBs but also equals to the embedding computed from the text description. Extensive experiments show that, without using any anchor information, the proposed approach consistently achieves better or comparable performance.
- Talking to the crowd: What do people react to in online discussions? Aaron Jaech, Victoria Zayats, Hao Fang, Mari Ostendorf and Hannaneh Hajishirzi
This paper addresses the question of how language use affects community reaction to comments in online discussion forums, and the relative importance of the message vs. the messenger. A new comment ranking task is proposed based on community annotated karma in Reddit discussions, which controls for topic and timing of comments. Experimental work with discussion threads from six subreddits shows that the importance of different types of language features varies with the community of interest.
- Semantic Relation Classification via Convolutional Neural Networks with Simple Negative Sampling kun xu, Yansong Feng, Songfang Huang and Dongyan Zhao
Syntactic features play an essential role in identifying relationship in a sentence. Previous neural network models often suffer from irrelevant information introduced when subjects and objects are in a long distance. In this paper, we propose to learn more robust relation representations from the shortest dependency path through a convolution neural network. We further propose a straightforward negative sampling strategy to improve the assignment of subjects and objects. Experimental results show that our method outperforms the state-of-the-art methods on the SemEval-2010 Task 8 dataset.
- Identification and Verification of Simple Claims about Statistical Properties Andreas Vlachos and Sebastian Riedel
In this paper we study the identification and verification of simple claims about statistical properties, e.g. claims about the population or the inflation rate of a country. We show that this problem is similar to extracting numerical information from text, and instead of annotating data for each property of interest in order to learn supervised models, we develop a distantly supervised baseline approach using an existing knowledge base and raw text. In experiments on 16 statistical properties about countries from Freebase we show that our approach identifies simple statistical claims about properties with 60% precision, while it is able to verify these claims without requiring any explicit supervision for this task.
- Not All Contexts Are Created Equal: Better Word Representations with Variable Attention Wang Ling, Lin Chu-Cheng, Yulia Tsvetkov, Silvio Amir, Ramon Fermandez, Chris Dyer, Alan W Black and Isabel Trancoso
We introduce an extension to the bag-of-words model for learning words representations that take into account both syntactic and semantic properties within language. This is done by employing an attention model, finds within the contextual words, the words that are relevant for each prediction. The general intuition of our model is that some words are only relevant for predicting local context (e.g. function words), while other words are more suited for determining global context, such as the topic of the document. Experiments performed on both semantically and syntactically oriented tasks show gains using our model over the existing bag of words model. Furthermore, compared to other more sophisticated models, our model scales better as we increase the size of the context of the model.
- An Empirical Analysis of Optimization for Max-Margin NLP Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick and Dan Klein
Despite the convexity of structured max-margin objectives (Taskar et al., 2004, Tsochantaridis et al., 2004), the many ways to optimize them are not equally effective in practice. We compare a range of stochastic optimization methods over a variety of structured NLP tasks (coreference, summarization, parsing, etc) and find several broad trends. First, margin methods do tend to outperform both likelihood and the perceptron. Second, for max-margin objectives, primal optimization methods are substantially faster and often more robust than dual methods. This advantage is most pronounced for tasks with dense or continuous-valued features. Overall, we argue for a particularly simple stochastic primal subgradient descent method that, despite being rarely mentioned in the literature, is surprisingly effective in relation to its alternatives.
- Towards Temporal Tagging for All Languages Jannik Strötgen and Michael Gertz
Temporal taggers are usually developed for a certain language. Besides English, only few languages have been addressed, and only one tagger covers several languages. While this tool was manually extended to each language, there have been earlier approaches for automatic extensions to a single target language. In this paper, we present an approach to extend a temporal tagger to all languages in the world. Our evaluation shows promising results, in particular considering that our approach neither requires language skills nor training data -- but results in a baseline tool for more than 200 languages.
- Reinforcing the Topic of Embeddings with Theta Pure Dependence for Text Classification Ning Xing, Yuexian Hou, Peng Zhang, Wenjie Li and Dawei Song
For sentiment classification, it is often recognized that embedding based on distributional hypothesis is weak in capturing sentiment contrast--contrasting words may have similar local context. Based on broader context, we propose to incorporate Theta Pure Dependence (TPD) into the Paragraph Vector method to reinforce topical and sentimental information. TPD has a theoretical guarantee that the word dependency is pure, i.e., the dependence pattern has the integral meaning whose underlying distribution can not be conditionally factorized. Our method outperforms the state-of-the-art performance on text classification tasks.
- Image-Mediated Learning for Zero-Shot Cross-Lingual Document Retrieval Ruka Funaki and Hideki Nakayama
We propose an image-mediated learning approach for cross-lingual document retrieval where no or few parallel corpora are available. Using images in image-text documents of each language as the hub, we derive a common semantic subspace bridging two languages by means of generalized canonical correlation analysis. For evaluation purposes, we create and release a new document dataset consisting of three types of data (English text, Japanese text, and image). We have succeeded in substantially enhancing retrieval accuracy in zero-shot and few-shot scenarios where text-to-text examples are scarce.
- Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model Hajime Morita, Daisuke Kawahara and Sadao Kurohashi
We present a new morphological analysis model that considers semantic plausibility of word sequences by using a recurrent neural network language model (RNNLM). In unsegmented languages, since language models are learned from automatically segmented texts and inevitably contain errors, it is not apparent that conventional language models contribute to morphological analysis. To solve this problem, we do not use language models based on raw word sequences but use a semantically generalized language model, i.e., RNNLM, in morphological analysis. In our experiments on two Japanese corpora, our proposed model significantly outperformed baseline models. This result indicates the effectiveness of RNNLM in morphological analysis.
- Mr. Bennet, his coachman, and the Archbishop walk into a bar but only one of them gets recognized: On The Difficulty of Detecting Characters in Literary Texts Hardik Vala, David Jurgens, Andrew Piper and Derek Ruths
Characters are fundamental to literary analysis. Current approaches are heavily reliant on NER to identify characters, causing many to be overlooked. We propose a novel technique for character detection, achieving significant improvements over state of the art on multiple datasets.
- A Strong Lexical Matching Method for the Machine Comprehension Test Ellery Smith, Nicola Greco, Matko Bosnjak and Andreas Vlachos
Machine comprehension of text is the overarching goal of a lot of research in natural language processing. The Machine Comprehension Test was recently proposed to assess methods on an open-domain, extensible and easy-to-evaluate task consisting of two datasets, MC160 and MC500. In this paper we develop a strong lexical matching method that takes into account the type of the question, as well as linguistic analysis such as coreference and hypernymy resolution. We show that the proposed method outperforms the baseline of Richardson et al. (2013), and is on par with a recently proposed discourse-based method of Narasimhan and Barzilay (2015), achieving the best reported results when combined with an off-the-shelf RTE system. Furthermore, we argue that MC500 is harder than MC160, due the way question answer pairs were created.
- An Improved Non-monotonic Transition System for Dependency Parsing Matthew Honnibal and Mark Johnson
Transition-based dependency parsers usually use transition systems that monotonically extend partial parse states until they identify a complete parse tree. Honnibal et al. (2013) showed that greedy parsing accuracy can be improved by adding additional non-monotonic transitions that permit the parser to "repair" earlier parsing mistakes by "over-writing" earlier parsing decisions. This increases the size of the set of complete parse trees that each partial parse state can derive, enabling such a parser to escape the "garden paths" that can trap monotonic greedy transition-based dependency parsers. We describe a new set of non-monotonic transitions that permits a partial parse state to derive a larger set of completed parse trees than previous work, which allows our parser to escape from a larger set of garden paths. A parser with our new non-monotonic transition system has 91.85% directed attachment accuracy, an improvement of 0.6% over a comparable parser using the standard monotonic arc-eager transitions.
- Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings Nanyun Peng and Mark Dredze
We consider the task of named entity recognition for Chinese social media. The long line of work in Chinese NER has focused on formal domains, and NER for social media has been largely restricted to English. We present a new corpus of Weibo messages annotated for both name and nominal mentions. Additionally, we evaluate three types of neural embedding for representing Chinese text. Finally, we propose a joint training objective for the embeddings that makes use of both (NER) labeled and unlabeled raw text. Our methods yield a 9% improvement over a stateof-the-art baseline.
- Motivating Personality-aware Machine Translation Shachar Mirkin, Scott Nowson, Caroline Brun and Julien Perez
Language use is known to be influenced by socio-demographic characteristics such as gender and personality. This understanding supports user modeling, enabling automatic classification of these traits. It has recently been shown that knowledge of these aspects of an author can improve performance in NLP tasks such as topic and sentiment modeling. When the user is multilingual or when few resources exist in a certain language, machine translation seems useful. Yet since MT systems, particularly statistical ones, are user-generic, a concern is that such traits may not manifest consistently in the target language. In this work we begin to explore whether translation preserves socio-demographic traits, motivating the need for personal and personality-aware machine translation models.
- Noise or additional information? Using crowdsource annotation item agreement for natural language tasks. Emily Jamison and Iryna Gurevych
In order to reduce noise in training data, most natural language crowdsourcing annotation tasks gather redundant labels and aggregate them into an integrated label, which is provided to the classifier. However, aggregation discards potentially useful information from linguistically ambiguous instances. For five natural language tasks, we pass item agreement on to the task classifier via soft labeling and low-agreement filtering of the training dataset. We find a statistically significant benefit from low item agreement training filtering in four of our five tasks, and no systematic benefit from soft labeling.
- Does Symbol Grounding Improve Word Segmentation? Hirotaka Kameko, Shinsuke Mori and Yoshimasa Tsuruoka
We propose a novel framework for improving a word segmenter using information acquired from symbol grounding. We generate a term dictionary in three steps: generating a pseudo-stochastically segmented corpus, building a symbol grounding model to enumerate word candidates, and selecting them according to the symbol grounding scores. We applied our method to Japanese chess and the commentaries to them. The experimental results show that our method successfully improved the word segmenter based on our framework. Our framework is general enough to be applied to other domains with a symbol grounding model.
- Inferring Binary Relation Schemas for Open Information Extraction Kangqi Luo, Xusheng Luo and Kenny Zhu
This paper presents a framework to model the semantic representation of binary relations in open information extraction systems. For each binary relation, we infer preferred types on the two arguments simultaneously, and generate a ranked list of type-pairs which are called schemas, along with their scores. All inferred types are drawn from Freebase type taxonomy, which are human readable. Our system collects 176,235 binary relations from ReVerb, and is able to produce top-ranking relation schemas at 94.4% precision.
- Bayesian Optimization of Text Representations Dani Yogatama and Noah A. Smith
When applying machine learning to problems in NLP, there are many choices to make about how to represent input texts. They can have a big effect on performance, but they are often uninteresting to researchers or practitioners who simply need a module that performs well. We apply sequential model-based optimization over this space of choices and show that it makes standard linear models competitive with more sophisticated, expensive state-of-the-art methods based on latent variables or neural networks on various topic classification and sentiment analysis problems. Our approach is a first step towards black-box NLP systems that work with raw text and do not require manual tuning.
- On A Strictly Convex IBM Model 1 Andrei Simion, Michael Collins and Cliff Stein
IBM Model 1 is a classical alignment model. Of the first generation word-based SMT models, it was the only such model with a concave objective function. For concave optimization problems like IBM Model 1, we have guarantees on the convergence of optimization algorithms such as Expectation Maximization (EM). However, as was pointed out recently, the objective of IBM Model 1 is not strictly concave and there is quite a bit of alignment quality variance within the optimal solution set. In this work we detail a strictly concave version of IBM Model 1 whose EM algorithm is a simple modification of the original EM algorithm of Model 1 and does not require the tuning of a learning rate or the insertion of an $l_{2}$ penalty. Moreover, by addressing Model 1's shortcomings, we achieve AER and F-Measure improvements over the classical Model 1 by over 30%
- Factorization of Latent Variables in Distributional Semantic Models Magnus Sahlgren, David
dling and Arvid
sterlund
This paper discusses the use of factorization techniques in distributional semantic models. We focus on a method for redistributing the weight of latent variables, which have previously been shown to improve the performance of distributional semantic models. However, this result has not been replicated and remains poorly understood. We refine the method, and provide additional theoretical justification, as well as empirical results that demonstrate the viability of the proposed approach.
- Improving fast_align by Reordering Chenchen Ding, Masao Utiyama and Eiichiro Sumita
fast_align is a simple, fast, and efficient approach for word alignment based on the IBM model 2. fast_align performs well for language pairs with relatively similar word orders; however, it does not perform well for language pairs with drastically different word orders. We propose a segmenting-reversing reordering process to solve this problem by alternately applying fast_align and reordering source sentences during training. Experimental results with Japanese-English translation demonstrate that the proposed approach improves the performance of fast_align significantly without the loss of efficiency. Experiments using other languages are also reported.
- Abstractive Multi-document Summarization with Semantic Information Extraction Wei Li and Hai Zhuge
This paper proposes a novel approach to generate abstractive summary for multiple documents by extracting semantic information from texts. The concept of Basic Semantic Unit (BSU) is defined to describe the semantics of an event or action. A semantic link network on BSUs is constructed to capture the semantic information of texts. Summary structure is planned with sentences generated based on the semantic link network. Experiments demonstrate that the approach is effective in generating informative, coherent and compact summary.
- Concept-based Summarization using Integer Linear Programming: From Concept Pruning to Multiple Optimal Solutions Florian Boudin, Hugo Mougard and Benoit Favre
In concept-based summarization, sentence selection is modelled as a budgeted maximum coverage problem. As this problem is NP-hard, pruning low-weight concepts is required for the solver to find optimal solutions efficiently. This work shows that reducing the number of concepts in the model leads to lower ROUGE scores, and more importantly to the presence of multiple optimal solutions. We address these issues by extending the model to provide a single optimal solution, and eliminate the need for concept pruning using an approximation algorithm that achieves comparable performance to exact inference.
- Recognizing Textual Entailment Using Probabilistic Inference Lei Sha, Sujian Li, Baobao Chang, Zhifang Sui and Tingsong Jiang
Recognizing Text Entailment (RTE) plays an important role in NLP applications including question answering, information retrieval, etc. In recent work, some research explore ``deep'' expressions such as discourse commitments or strict logic for representing the text. However, these expressions suffer from the limitation of inference inconvenience or translation loss. To overcome the limitations, in this paper, we propose to use the predicate-argument structures to represent the discourse commitments extracted from text. At the same time, with the help of the YAGO knowledge, we borrow the distant supervision technique to mine the implicit facts from the text. We also construct a probabilistic network for all the facts and conduct inference to judge the confidence of each fact for RTE. The experimental results show that our proposed method achieves a competitive result compared to the previous work.
- Improving Arabic Diacritization through Syntactic Analysis Anas Shahrour, Salam Khalifa and Nizar Habash
We present an approach to Arabic automatic diacritization that integrates syntactic analysis with morphological tagging through improving the prediction of case and state features. Our best system increases the accuracy of word diacritization by 2.5% absolute on all words, and 5.2% absolute on nominals over a state-of-the-art baseline. Similar increases are shown on the full morphological analysis choice.
- Unsupervised Negation Focus Identification with Word-Topic Graph Model Bowei Zou, Guodong Zhou and Qiaoming Zhu
Due to the commonality in natural language, negation focus plays a critical role in deep understanding of context. However, existing studies for negation focus identification major on supervised learning which is time-consuming and expensive due to manual preparation of annotated corpus. To address this problem, we propose an unsupervised word-topic graph model to represent and measure the focus candidates from both lexical and topic perspectives. Moreover, we propose a document-sensitive biased PageRank algorithm to optimize the ranking scores of focus candidates. Evaluation on the *SEM 2012 shared task corpus shows that our proposed method outperforms the state of the art on negation focus identification.
- Shallow Convolutional Neural Network for Implicit Discourse Relation Recognition Biao Zhang, jinsong su, Deyi Xiong, Yaojie Lu, Hong Duan and Junfeng Yao
Implicit discourse relation recognition remains a serious challenge due to the absence of discourse connectives. In this paper, we propose a Shallow Convolutional Neural Network (SCNN) for implicit discourse relation recognition, which contains only one hidden layer but is effective in relation recognition. The shallow structure alleviates the overfitting problem, while the convolution and nonlinear operations help preserve the recognition and generalization ability of our model. Experiments on the benchmark data set show that our model achieves comparable and even better performance when compares against current state-of-art systems.
- Reverse-engineering Language: A Study on the Semantic Compositionality of German Compounds Corina Dima
In this paper we analyze the performance of different composition models on a large dataset of German compound nouns. Given a vector space model for the German language, we try to reconstruct the observed representation (the corpus-estimated vector) of a compound by composing the observed representations of its two immediate constituents. We explore the composition models proposed in the literature and also present a new, simple model that achieves the best performance on our dataset.
- Large-Scale Acquisition of Entailment Pattern Pairs by Exploiting Transitivity Julien Kloetzer, Kentaro Torisawa, Chikara Hashimoto and Jong-Hoon Oh
We propose a novel method for acquiring entailment pairs of binary patterns on a large scale. This method exploits the transitivity of entailment and a self-training scheme to improve the performance of an already strong supervised classifier for entailment and, unlike previous methods that exploit transitivity, it works on a large scale. Using it we acquired with 70% precision, 138.1 million pattern pairs with non-trivial lexical substitution such as "use Y to distribute X" -> "X is available on Y", whose extraction is considered difficult. This represents 57.5% more pattern pairs than what our supervised baseline extracted at the same precision.
- GhostWriter: Using an LSTM for Automatic Rap Lyric Generation Peter Potash, Alexey Romanov and Anna Rumshisky
This paper demonstrates the effectiveness of a Long Short-Term Memory language model in our initial efforts to generate unconstrained rap lyrics. The goal of this model is to generate lyrics that are similar in style to that of a given rapper, but not identical to existing lyrics. Unlike previous work, which defines explicit templates for lyric generation, our model defines its own rhyme scheme, line length, and verse length.
- Context-Dependent Knowledge Graph Embedding Yuanfei Luo, Quan Wang and Bin Wang
We consider the problem of embedding knowledge graphs (KGs) into continuous vector spaces. Existing methods can only deal with explicit relationships within each triple, i.e., local connectivity patterns, but cannot handle implicit relationships across different triples, i.e., contextual connectivity patterns. This paper proposes context-dependent KG embedding, a two-stage framework that takes into account both types of connectivity patterns and obtains more accurate and stable embeddings. We evaluate our approach in link prediction and triple classification, and achieve significant and consistent improvements over state-of-the-art methods.
- A Comparative Study on Regularization Strategies for Embedding-based Neural Networks Hao Peng, Lili Mou, Ge Li, Yunchuan Chen, Yangyang Lu and Zhi Jin
This paper aims to compare different regularization strategies to address a common phenomenon, severe overfitting, in embedding-based neural networks for NLP. We chose two widely studied neural models and tasks as our testbed. We tried several frequently applied or newly proposed regularization strategies, including penalizing weights (embeddings excluded), penalizing embeddings, re-embedding words, and dropout. We also emphasized on incremental hyperparameter tuning, and combining different regularizations. The results in this work provide a picture on tuning hyperparameters for neural NLP.
- On the Role of Discourse Markers for Discriminating Claims and Premises in Argumentative Discourse Judith Eckle-Kohler, Roland Kluge and Iryna Gurevych
This paper presents a study on the role of discourse markers in argumentative discourse. We annotated a German corpus with arguments according to the common claim-premise model of argumentation and performed various statistical analyses regarding the discriminative nature of discourse markers for claims and premises. Our experiments show that particular semantic groups of discourse markers are indicative of either claims or premises and constitute highly predictive features for discriminating between them.
- Fatal or not? Finding errors that lead to dialogue breakdowns in chat-oriented dialogue systems Ryuichiro Higashinaka, Masahiro Mizukami, Kotaro Funakoshi, Masahiro Araki, Hiroshi Tsukahara and Yuka Kobayashi
This paper aims to find errors that lead to dialogue breakdowns in chat-oriented dialogue systems. We collected chat dialogue data, annotated them with dialogue breakdown labels, and collected comments describing the error that led to the breakdown. By mining the comments, we first identified error types. Then, we calculated the correlation between an error type and the degree of dialogue breakdown it incurred, quantifying its impact on dialogue breakdown. This is the first study to quantitatively analyze error types and their effect in chat-oriented dialogue systems.
- Krimping texts for better summarization Marina Litvak, Mark Last and Natalia Vanetik
Automated text summarization is aimed at extracting essential information from original text and presenting it in a minimal, often predefined, number of words. In this paper, we introduce a new approach for unsupervised extractive summarization, based on the Minimum Description Length (MDL) principle. The approach represents a text as a transactional dataset, with sentences as transactions, and then describes it by itemsets that stand for frequent sequences of words. The summary is then compiled from sentences that compress (and as such, best describe) the document. The problem of summarization is reduced to the maximal coverage, following the assumption that a summary that best describes the original text, should cover most of describing the document word sequences. We solve it by a greedy algorithm and present the evaluation results.
- Learning Word Meanings and Grammar for Describing Everyday Activities in Smart Environments Muhammad Attamimi, Yuji Ando, Tomoaki Nakamura, Takayuki Nagai, Daichi Mochihashi, Ichiro Kobayashi and Hideki Asoh
If intelligent systems are to interact with humans in a natural manner, the ability to describe daily life activities is important. To achieve this, sensing human activities by capturing multimodal information is necessary. In this study, we consider a smart environment for sensing activities with respect to realistic scenarios. We next propose a sentence generation system from observed multimodal information in a bottom up manner using multilayered multimodal latent Dirichlet allocation and Bayesian hidden Markov models. We evaluate the grammar learning and sentence generation as a complete process within a realistic setting. The experimental result reveals the effectiveness of the proposed method.
- A discriminative training procedure for Continuous Translation Models Quoc-Khanh DO, Alexandre Allauzen and Franois Yvon
Continuous-space language and translation models have recently emerged as extremely powerful ways to boost the performance of existing translation systems. A simple, yet effective way to integrate such models in inference is to use them in an n-best rescoring step. In this paper, we focus on this scenario and show that the performance gains in rescoring can be greatly increased when the neural network is trained jointly with all the other model parameters, using an appropriate objective function. Our approach is validated on two domains, where it outperforms strong baselines.
- From the Virtual to the Real World: Referring to Objects in Spatial Real-World Images Dimitra Gkatzia and Verena Rieser
Predicting the success of referring expressions (RE) is vital for real-world applications such as navigation systems. Traditionally, research has focused on studying Referring Expression Generation (REG) in virtual, controlled environments. In this paper, we describe a novel study of spatial references from real scenes rather than virtual. First, we investigate how humans describe objects in open, uncontrolled scenarios and compare our findings to those reported in virtual environments. We show that REs in real-world scenarios differ significantly to those in virtual worlds. Second, we propose a novel approach to quantifying image complexity when complete annotations are not present (e.g. due to poor object recognition capabitlities), and third, we present a model for success prediction of REs for objects in real scenes. Finally, we discuss implications for Natural Language Generation (NLG) systems and future directions.
- System Combination for Machine Translation through Paraphrasing Wei-Yun Ma and Kathy McKeown
In this paper, we propose a paraphrasing model to address the task of system combination for machine translation. We dynamically learn hierarchical paraphrases from target hypotheses and form a synchronous context-free grammar to guide a series of transformations of target hypotheses into fused translations. The model is able to exploit phrasal and structural system-weighted consensus and also to utilize existing information about word ordering present in the target hypotheses. In addition, to consider a diverse set of plausible fused translations, we develop a hybrid combination architecture, where we paraphrase every target hypothesis using different fusing techniques to obtain fused translations for each target, and then make the final selection among all fused translations. Our experimental results show that our approach can achieve a significant improvement over combination baselines.
- An Unsupervised Bayesian Modelling Approach for Storyline Detection on News Articles Deyu Zhou, Haiyang Xu and Yulan He
Storyline detection aims at summarizing news from different epochs and revealing the evolution structure of events. It is a big challenge because it needs not only to detect stories at each epoch, but also to link theses stories dynamically. Moreover, each storyline has different hierarchical structures and the hierarchical structure of the storyline at different epochs is dependent. Existing approaches ignore the dependence of hierarchical structure in storyline generation. In this paper, we propose an unsupervised Bayesian model, called dynamic storyline detection model, to extract the structured representation and evolution pattern of the storyline. The proposed model is evaluated on a large scale news corpus. Experimental results show that our proposed model outperforms several baseline approaches.
- Hierarchical Incremental Adaptation for Statistical Machine Translation Joern Wuebker, Spence Green and John DeNero
We present an incremental adaptation approach for statistical machine translation that maintains a flexible hierarchical domain structure within a single consistent model. Both weights and rules are updated incrementally on a stream of post-edits. Our multi-level domain hierarchy allows the system to adapt simultaneously towards local context at different levels of granularity, including genres and individual documents. Our experiments show consistent improvements in translation quality from all components of our approach.
- Topical Coherence for Graph-based Extractive Summarization Daraksha Parveen and Michael Strube
We describe an approach for extractive single-document summarization based on a weighted graphical representation of documents obtained by topic modeling. We compare with state-of-the-art results on scientific articles from PLOS Medicine and on DUC 2002 data.
- Discourse Element Identification in Student Essays based on Global and Local Sentence Chains Wei Song
In this paper, we exploit global and local cohesion for identifying discourse elements in student essays. Specifically,we create global and local sentence chains based on lexical chains. We derive features based on sentence chains and enhance discourse element identification in a supervised framework. Experimental results show that new proposed features are discriminative and lead to significant improvement against the baselines for both prompt-directed and prompt-free essays.
- Summarizing Student Responses to Reflection Prompts Wencan Luo and Diane Litman
We propose to automatically summarize student responses to reflection prompts and introduce a novel summarization algorithm that differs from traditional methods in several ways. First, since the linguistic units of student inputs range from single words to multiple sentences, our summaries are created from extracted phrases rather than from sentences. Second, the phrase summarization algorithm ranks the phrases by the number of students who semantically mention a phrase in a summary. Experimental results show that the proposed phrase summarization approach achieves significantly better summarization performance on an engineering course corpus in terms of ROUGE scores when compared to other summarization methods, including MEAD, LexRank and MMR.
- Online Representation Learning in Recurrent Neural Language Models Marek Rei
We investigate an extension of continuous online learning in recurrent neural network language models. The model keeps a separate vector representation of the current unit of text being processed and adaptively adjusts it after each prediction. The initial experiments give promising results, indicating that the method is able to increase language modelling accuracy, while also decreasing the parameters needed to store the model along with the computation required at each step.
- Adapting Coreference Resolution for Narrative Processing Quynh Ngoc Thi Do, Steven Bethard and Marie-Francine Moens
Domain adaptation is a challenge for supervised NLP systems because of expensive and time-consuming manual annotation resources. We present a method to adapt a supervised coreference resolution system trained on newswire domain to short narrative stories without retraining the system. The idea is to perform inference via an Integer Linear Programming (ILP) formulation with the features of narratives adopted soft constraints. When testing on the UMIREC and N2 corpora with the-stateof-the-art Berkeley coreference resolution system trained on OntoNotes, our inference substantially outperforms the original inference on the CoNLL 2011 metric.
- ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks Rohit Gupta, Constantin Orasan and Josef van Genabith
Many state-of-the-art Machine Translation (MT) evaluation metrics are complex, involve extensive external resources (e.g. for paraphrasing) and require tuning to achieve best results. We present a simple alternative approach based on dense vector spaces and recurrent neural networks (RNNs). For WMT-14 our new metric scores best for two out of five language pairs, and overall best and second best on all language pairs, using Spearman and Pearson correlation, respectively. We also show how training data is computed automatically from WMT ranks data.
- LCSTS: A Large Scale Chinese Short Text Summarization Dataset Baotian Hu, Qingcai Chen and Fangze Zhu
Automatic text summarization is widely regarded as the highly difficult problem, partially because of the lack of large text summarization data set. Due to the great challenge of constructing the large scale summaries for full text, in this paper, we introduce a large corpus of Chinese short text summarization dataset constructed from the Chinese microblogging website Sina Weibo, which will be released to public soon. This corpus consists of over 2 million real Chinese short texts with short summaries given by the writer of each text. We also manually tagged the relevance of 10,666 short summaries with their corresponding short texts. Based on the corpus, we introduce recurrent neural network for the summary generation and achieve promising results, which not only shows the usefulness of the proposed corpus for short text summarization research, but also provides a baseline for further research on this topic.
- Higher-order logical inference with compositional semantics Koji Mineshima, Pascual Martínez-Gómez, Yusuke Miyao and Daisuke Bekki
We present a higher-order inference system based on a formal compositional semantics and the wide-coverage CCG parser. We develop an improved method to bridge between the parser and semantic composition. The system is evaluated on the FraCaS test suite. In contrast to the widely held view that higher-order logic is unsuitable for efficient logical inferences, the results show that a system based on a reasonably-sized semantic lexicon and a manageable number of non-first-order axioms enables efficient logical inferences, including those concerned with generalized quantifiers and intensional operators, and outperforms the state-of-the-art first-order inference system.
- Learning to identify the best contexts for knowledge-based WSD Evgenia Wasserman Pritsker, William Cohen and Einat Minkov
We outline a learning framework that aims at identifying useful contextual cues for knowledge-based word sense disambiguation (WSD). The usefulness of individual context words is evaluated based on diverse lexico-statistical and syntactic information, as well as simple word distance. Experiments using two different knowledge-based methods and benchmark datasets show signiÞcant improvements due to context modeling.
- Experiments with Generative Models for Dependency Tree Linearization Richard Futrell and Edward Gibson
We present experiments with generative models for linearization of unordered labeled syntactic dependency trees (Belz et al., 2011; Rajkumar and White, 2014). Our models are derived from generalizations of generative models for dependency structure (Eisner, 1996). We construct a series of models which capture successively more information about ordering constraints among sister dependents. We test our models on corpora of 5 languages using test-set likelihood, and we collect human ratings for generated forms in English. Our models benefit from representing local order constraints among sisters and from backing off to less sparse distributions.
- Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages Nut Limsopatham and Nigel Collier
Previous studies have shown that health reports in social media, such as DailyStrength and Twitter, have potential for monitoring health conditions (e.g. adverse drug reactions, infectious diseases) in particular communities. However, in order for a machine to understand and make inferences on these health conditions, the ability to recognise when laymen's terms refer to a particular medical concept (i.e. text normalisation) is required. To achieve this, we propose to adapt an existing phrase-based machine translation (MT) technique and a vector representation of words to map between a social media phrase and a medical concept. We evaluate our proposed approach using a collection of phrases from tweets related to adverse drug reactions. Our experimental results show that the combination of a phrase-based MT technique and the similarity between word vector representations outperforms the baselines that apply only either of them by up to 55%.
- Investigating Continuous Space Language Models for Machine Translation Quality Estimation Kashif Shah, Raymond W. M. Ng, Fethi Bougares and Lucia Specia
We present the novel features trained with a deep neural network for Machine Translation (MT) Quality Estimation (QE). The features are learned with a Continuous Space Language Model (CSLM) by estimating the probabilities of the source and target segments. These new features along with other available MT system-independent features are investigated on series of datasets with various quality labels for QE including Post Editing Effort (PEE) , Human Translation Edit Rate (HTER), Post Editing Time (PET) and METEOR score. The results show the significant improvements in predictions over the baseline as well as systems trained on previously available features set for all WMT QE tasks. More notably, the addition of newly proposed features beat previous best systems by a significant margin on official WMT12 and WMT14 post editing effort prediction tasks.
- Supervised Phrase Table Triangulation with Neural Word Embeddings for Low-Resource Languages Tomer Levinboim and David Chiang
In this paper we show that small amounts of source-target bilingual data (a dictionary or a parallel corpus) can be used to improve noisy triangulated phrase tables. In particular, we regard word translation probabilities extracted from bilingual data as gold labels to be used in a supervised learning setting. We demonstrate that this leads to translation quality improvement on two tasks: 1) On Malagasy-to-French translation via English, we use only 1k dictionary entries to gain +0.5 BLEU over triangulation. (2) On Spanish-to-French via English we use only 4k sentence pairs to gain +0.7 BLEU over triangulation interpolated with a phrase table extracted from the same 4k sentence pairs.
- Semi-supervised Dependency Parsing using Bilexical Contextual Features from Auto-Parsed Data Eliyahu Kiperwasser and Yoav Goldberg
We present a semi-supervised approach to improve dependency parsing accuracy by using bilexical statistics derived from auto-parsed data. The method is based on estimating the attachment potential of head-modifier words, by taking into account not only the head and modifier words themselves, but also the words surrounding the head and the modifier. When integrating the learned statistics as features in a graph-based parsing model, we observe nice improvements in accuracy when parsing various English datasets.
- Birds of a Feather Linked Together: A Discriminative Topic Model using Link-based Priors Weiwei Yang, Jordan Boyd-Graber and Philip Resnik
A wide range of applications, from social media to scientific literature analysis, involve graphs in which documents are connected by links. We introduce a new topic model for link prediction based on the intuition that linked documents will tend to have similar topic distributions, integrating a max-margin learning criterion and lexical term weights in the loss function. We validate our approach using predictive link rank with Sina Weibo users.
- On Available Corpora for Empirical Methods in Vision & Language Francis Ferraro, Nasrin Mostafazadeh, Ting-Hao Huang, Michel Galley, Lucy Vanderwende, Margaret Mitchell and Jacob Devlin
Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and classify them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.
- The Overall Markedness of Discourse Relations Lifeng Jin and Marie-Catherine de Marneffe
Discourse relations can be categorized as continuous or discontinuous in the hypothesis of continuity (Murray, 1997), with continuous relations expressing normal succession of events in discourse such as temporal, spatial or causal. Asr and Demberg (2013) propose a markedness measure to test the prediction that discontinuous relations may have more unambiguous connectives, but restrict the markedness calculation to relations with explicit connectives only. This paper extends their measure to explicit and implicit relations and shows that results from this extension better fit the continuity hypothesis predictions both for the English Penn Discourse (Prasad et al., 2008) and the Chinese Discourse (Zhou and Xue, 2015) Treebanks.
- A Binarized Neural Network Joint Model for Machine Translation Jingyi Zhang, Masao Utiyama, Eiichiro Sumita, Graham Neubig and Satoshi Nakamura
The neural network joint model (NNJM), which augments the neural network Language model (NNLM) with an m-word source context window, has achieved large gains in machine translation accuracy, but also has problems with high normalization cost when using large vocabularies. Training the NNJM with noise-contrastive estimation (NCE), instead of standard maximum likelihood estimation (MLE), can reduce computation cost. In this paper, we propose an alternative to NCE, the binarized NNJM (BNNJM), which learns a binary classifier that takes both the context and target words as input, and can be efficiently trained using MLE. We compare the BNNJM and NNJM trained by NCE on Chinese-to-English and Japanese-to-English translation tasks.
- Combining Geometric, Textual and Visual Features for Generating Prepositions in Image Descriptions Arnau Ramisa, Josiah Wang, Ying Lu, Emmanuel Dellandrea, Francesc Moreno-Noguer and Robert Gaizauskas
We investigate the role that geometric, textual and visual features play in the task of predicting a preposition that links two visual entities depicted in an image. The task is an important part of the subsequent process of generating image descriptions. We explore the prediction of prepositions for a pair of entities, both in the case when the labels of such entities are known and unknown. In all situations we found clear evidence that all three features contribute to the prediction task.
- Improved Transition-Based Parsing and Tagging with Neural Networks Chris Alberti, David Weiss, Greg Coppola and Slav Petrov
We extend and improve recent work on structured neural network transition-based parsing. We first introduce novel set-valued features for morphology and part-of-speech ambiguity and then investigate training different transition systems with neural network representations. Our multi-lingual evaluation demonstrates the robustness of the approach and the ease with which techniques developed for sparse linear learning approaches can be transferred to the dense non-linear setting. What is perhaps most exciting is that the gains from morphology and integrated tagging and parsing are even larger in the neural network setting.
- Experiments in Open Domain Deception Detection Veronica Perez-Rosas and Rada Mihalcea
The widespread use of deception in online sources has motivated the need for methods to automatically profile and identify deceivers. This work explores deception, gender and age detection in short texts using a machine learning approach. First, we collect a new open domain deception dataset also containing demographic data such as gender and age. Second, we extract feature sets including n-grams, shallow and deep syntactic features, semantic features, and syntactic complexity and readability metrics. Third, we build classifiers that aim to predict deception, gender, and age. Our findings show that while deception detection can be performed in short texts even in the absence of a pre-determined domain, gender and age prediction in deceptive texts is a challenging task. We further explore the linguistic differences in deceptive content that relate to deceivers gender and age and find evidence that both age and gender play an important role in people's word choices when fabricating lies
- A model of rapid phonotactic generalization Tal Linzen and Timothy O'Donnell
The phonotactics of a language describes the ways in which the sounds of the language combine to form possible morphemes and words. Humans can learn phonotactic patterns at the level of abstract classes, generalizing across sounds (e.g., "words can end in a voiced stop"). Moreover, they rapidly acquire these generalizations, even before they acquire soundspecific patterns. We present a probabilistic model intended to capture this earlyabstraction phenomenon. The model represents both abstract and concrete generalizations in its hypothesis space from the outset of learning. ThisÑcombined with a parsimony bias in favor of compact descriptions of the input dataÑleads the model to favor rapid abstraction in a way similar to human learners.
- Component-Enhanced Chinese Character Embeddings Yanran Li and Wenjie Li
Distributed word representations are very useful for capturing semantic information and have been successfully applied in many NLP tasks, especially on English. In this work, we innovatively develop two component-enhanced Chinese character embedding models and their bi-gram extensions. Distinguished from English word embeddings, our models explore the compositions of Chinese characters, which often serve as semantic indictors inherently. The evaluations on both word similarity and text classification demonstrate the effectiveness of our models.
- Hierarchical Phrase-based Stream Decoding Andrew Finch, Xiaolin Wang and Eiichiro Sumita
This paper proposes a method for hierarchical phrase-based stream decoding. A stream decoder is able to take a continuous stream of tokens as input, and segments this stream into word sequences that are translated and output as a stream of target word sequences. Phrase-based stream decoding techniques have been shown to be effective as a means of simultaneous interpretation. In this paper we transfer the essence of this idea into the framework of hierarchical machine translation. The hierarchical decoding framework organizes the decoding process into a chart; this structure is naturally suited to the process of stream decoding, leading to an efficient stream decoding algorithm that searches a restricted subspace containing only relevant hypotheses. Furthermore, the decoder allows more explicit access to the word re-ordering process that is of critical importance in decoding while interpreting. The decoder was evaluated on TED talk data for English-Spanish and English-Chinese. Our results show that like the phrase-based stream decoder, the hierarchical is capable of approaching the performance of the underlying hierarchical phrase-based machine translation decoder, at useful levels of latency. In addition the hierarchical approach appeared to be robust to the difficulties presented by the more challenging English-Chinese task.
- Learning Better Embeddings for Rare Words Using Distributional Representations Irina Sergienya and Hinrich Schütze
There are two main types of word representations: low-dimensional embeddings and high-dimensional distributional vectors, in which each dimension corresponds to a context word. In this paper, we initialize an embedding-learning model with distributional vectors. Evaluation on word similarity shows that this initialization significantly increases the quality of embeddings learned for rare words.
- Composing Relationships with Translations Alberto Duran, Antoine Bordes and Nicolas Usunier
Performing link prediction in Knowledge Bases (KBs) with embedding-based models, like with the model TransE (Bordes et al., 2013) which represents relationships as translations in the embedding space, have shown promising results in recent years. Most of these works focused on modeling single relationships and hence do not take full advantage of the graph structure of KBs. In this paper, we propose an extension of TransE that learns to explicitly model composition of relationships via the addition of their corresponding translation vectors. We show empirically that this allows to improve performance for predicting single relationships as well as compositions of pairs of them.
- What's in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation Arne Köhn
In the last two years, there has been a surge of word embedding algorithms and research on them. However, evaluation has mostly been carried out on a narrow set of tasks, mainly word similarity/relatedness and word relation similarity and on a single language, namely English. We propose an approach to evaluate embeddings on a variety of languages that also yields insights into the structure of the embedding space by investigating how well word embeddings cluster along different syntactic features. We show that all embedding approaches behave similarly in this task, with dependency based embeddings performing best. This effect is even more pronounced when generating low dimensional embeddings.
- Rule Selection with Soft Syntactic Features for String-to-Tree Statistical Machine Translation Fabienne Braune, Nina Seemann and Alexander Fraser
In syntax-based machine translation, rule selection is the task of choosing the correct target side of a translation rule among rules with the same source side. We define a discrimininative rule selection model for string-to-tree systems that have syntactic annotation on the target language side. This is a new and clean way to integrate soft source syntactic constraints into string-to-tree systems, as features of the rule selection model. We release our implementation as part of Moses.
- WTF! Fast Cross-lingual Word-embeddings Jocelyn Coulmance, Amine Benhalloum, Jean-Marc Marty and Guillaume Wenzek
We introduce WTF, "Word-embeddings Trans-gram Framework", a simple and computationally-efficient method to simultaneously learn and align word-embeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data. We use our new framework to compute aligned word-embeddings for twenty-one languages using English as a pivot language and show that some linguistic features are aligned across languages for which we do not have aligned data, even though those properties do not exist in the pivot language. We also achieve state of the art results on standard cross-lingual text classification and word translation tasks.
TACL papers to be presented at EMNLP
- It’s All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation David Jurgens and Roberto Navigli
Annotated data is prerequisite for many NLP applications. Acquiring large-scale annotated corpora is a major bottleneck, requiring significant time and resources. Recent work has proposed turning annotation into a game to increase its appeal and lower its cost; however, current games are largely text-based and closely resemble traditional an notation tasks. We propose a new linguistic annotation paradigm that produces annotations from playing graphical video games. The effectiveness of this design is demonstrated using two video games: one to create a mapping from WordNet senses to images, and a second game that performs Word Sense Disambiguation. Both games produce accurate results. The first game yields annotation quality equal to that of experts and a cost reduction of 73% over equivalent crowdsourcing; the second game provides a 16.3% improvement in accuracy over current state-of-the-art sense disambiguation games with WordNet.
- Which Step Do I Take First? Troubleshooting with Bayesian Models Annie Louis and Mirella Lapata
Online discussion forums and community question-answering websites provide one of the primary avenues for online users to share information. In this paper, we propose text mining techniques which aid users navigate troubleshooting-oriented data such as questions asked on forums and their suggested solutions. We introduce Bayesian generative models of the troubleshooting data and apply them to two interrelated tasks (a) predicting the complexity of the solutions (e.g., plugging a keyboard in the computer is easier compared to installing a special driver) and (b) presenting them in a ranked order from least to most complex. Experimental results show that our models are on par with human performance on these tasks, while outperforming baselines based on solution length or readability.
- Modelling and Optimizing on Syntactic N-Grams for Statistical Machine Translation Rico Sennrich
The role of language models in SMT is to promote fluent translation output, but traditional n-gram language models are unable to capture fluency phenomena between distant words, such as some morphological agreement phenomena, subcategorisation, and syntactic collocations with string-level gaps. Syntactic language models have the potential to fill this modelling gap. We propose a language model for dependency structures that is relational rather than configurational and thus particularly suited for languages with a (relatively) free word order. It is trainable with Neural Networks, and not only improves over standard n-gram language models, but also outperforms related syntactic language models. We empirically demonstrate its effectiveness in terms of perplexity and as a feature function in string-to-tree SMT from English to German and Russian. We also show that using a syntactic evaluation metric to tune the log-linear parameters of an SMT system further increases translation quality when coupled with a syntactic language model.
- Unsupervised Lexicon Discovery from Acoustic Input Chia-ying Lee, Timothy J. O’Donnell, and James Glass
We present a model of unsupervised phonological lexicon discovery—the problem of simultaneously learning phoneme-like and word-like units from acoustic input. Our model builds on earlier models of unsupervised phone-like unit discovery from acoustic data (Lee and Glass, 2012), and unsupervised symbolic lexicon discovery using the Adaptor Grammar framework (Johnson et al., 2006), integrating these earlier approaches using a probabilistic model of phonological variation. We show that the model is competitive with state-of-the-art spoken term discovery systems, and present analyses exploring the model’s behavior and the kinds of linguistic structures it learns.
- Cross-Document Co-Reference Resolution using Sample-Based Clustering with Knowledge Enrichment Sourav Dutta and Gerhard Weikum
Identifying and linking named entities across information sources is the basis of knowledge acquisition and at the heart of Web search, recommendations, and analytics. An important problem in this context is cross-document co-reference resolution (CCR): computing equivalence classes of textual mentions denoting the same entity, within and across documents. Prior methods employ ranking, clustering, or probabilistic graphical models using syntactic features and distant features from knowledge bases. However, these methods exhibit limitations regarding run-time and robustness. This paper presents the CROCS framework for unsupervised CCR, improving the state of the art in two ways. First, we extend the way knowledge bases are harnessed, by constructing a notion of semantic summaries for intra-document co-reference chains using co-occurring entity mentions belonging to different chains. Second, we reduce the computational cost by a new algorithm that embeds sample-based bisection, using spectral clustering or graph partitioning, in a hierarchical clustering process. This allows scaling up CCR to large corpora. Experiments with three datasets show significant gains in output quality, compared to the best prior methods, and the run-time efficiency of CROCS.
- One Vector is Not Enough: Entity-Augmented Distributed Semantics for Discourse Relations Yangfeng Ji, Jacob Eisenstein
Discourse relations bind smaller linguistic units into coherent texts. Automatically identifying discourse relations is difficult, because it requires understanding the semantics of the linked arguments. A more subtle challenge is that it is not enough to represent the meaning of each argument of a discourse relation, because the relation may depend on links between lower-level components, such as entity mentions. Our solution computes distributed meaning representations for each discourse argument by composition up the syntactic parse tree. We also perform a downward compositional pass to capture the meaning of coreferent entity mentions. Implicit discourse relations are then predicted from these two representations, obtaining substantial improvements on the Penn Discourse Treebank.
- Problems in Current Text Simplification Research: New Data Can Help Wei Xu, Chris Callison-Burch, Courtney Napoles
Simple Wikipedia has dominated simplification research in the past 5 years. In this opinion paper, we argue that focusing on Wikipedia limits simplification research. We back up our arguments with corpus analysis and by highlighting statements that other researchers have made in the simplification literature. We introduce a new simplification dataset that is a significant improvement over Simple Wikipedia, and present a novel quantitative-comparative approach to study the quality of simplification data resources.
- Combining Minimally-supervised Methods for Arabic Named Entity Recognition Maha Althobaiti, Udo Kruschwitz, and Massimo Poesio
Supervised methods can achieve high performance on NLP tasks, such as Named Entity Recognition (NER), but new annotations are required for every new domain and/or genre change. This has motivated research in minimally supervised methods such as semi-supervised learning and distant learning, but neither technique has yet achieved performance levels comparable to those of supervised methods. Semi-supervised methods tend to have very high precision but comparatively low recall, whereas distant learning tends to achieve higher recall but lower precision. This complementarity suggests that better results may be obtained by combining the two types of minimally supervised methods. In this paper we present a novel approach to Arabic NER using a combination of semi-supervised and distant learning techniques. We trained a semi-supervised NER classifier and another one using distant learning techniques, and then combined them using a variety of classifier combination schemes, including the Bayesian Classifier Combination (BCC) procedure recently proposed for sentiment analysis. According to our results, the BCC model leads to an increase in performance of 8 percentage points over the best base classifiers.
- From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu, Dan Roth
The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates. However, it is still unclear how it can best be used, due to the heuristic nature of the confidences and its necessarily incomplete coverage. We propose models to leverage the phrase pairs from the PPDB to build parametric paraphrase models that score paraphrase pairs more accurately than the PPDB’s internal scores while simultaneously improving its coverage. They allow for learning phrase embeddings as well as improved word embeddings. Moreover, we introduce two new, manually annotated datasets to evaluate short-phrase paraphrasing models. Using our paraphrase model trained using PPDB, we achieve state-of-the-art results on standard word and bigram similarity tasks and beat strong baselines on our new short phrase paraphrase tasks.
- Improving Topic Models with Latent Feature Word Representations Dat Quoc Nguyen, Richard Billingsley, Lan Du, Mark Johnson
Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.
- Learning Structural Kernels for Natural Language Processing Daniel Beck, Trevor Cohn, Christian Hardmeier, Lucia Specia
Structural kernels are a flexible learning paradigm that has been widely used in Natural Language Processing. However, the problem of model selection in kernel-based methods is usually overlooked. Previous approaches mostly rely on setting default values for kernel hyperparameters or using grid search, which is slow and coarse-grained. In contrast, Bayesian methods allow efficient model selection by maximizing the evidence on the training data through gradient-based methods. In this paper we show how to perform this in the context of structural kernels by using Gaussian Processes. Experimental results on tree kernels show that this procedure results in better prediction performance compared to hyperparameter optimization via grid search. The framework proposed in this paper can be adapted to other structures besides trees, e.g., strings and graphs, thereby extending the utility of kernel-based methods.
- Latent Structures for Coreference Resolution Sebastian Martschat, Michael Strube
Machine learning approaches to coreference resolution vary greatly in the modeling of the problem: while early approaches operated on the mention pair level, current research focuses on ranking architectures and antecedent trees. We propose a unified representation of different approaches to coreference resolution in terms of the structure they operate on. We represent several coreference resolution approaches proposed in the literature in our framework and evaluate their performance. Finally, we conduct a systematic analysis of the output of these approaches, highlighting differences and similarities.
- Deriving Boolean Structures from Distributional Vectors Germán Kruszewski, Denis Paperno, Marco Baroni
Corpus-based distributional semantic models capture degrees of semantic relatedness among the words of very large vocabularies, but have problems with logical phenomena such as entailment, that are instead elegantly handled by model-theoretic approaches, which, in turn, do not scale up. We combine the advantages of the two views by inducing a mapping from distributional vectors of words (or sentences) into a Boolean structure of the kind in which natural language terms are assumed to denote. We evaluate this Boolean Distributional Semantic Model (BDSM) on recognizing entailment between words and sentences. The method achieves results comparable to a state-of-the art SVM, degrades more gracefully when less training data are available and displays interesting qualitative properties.
- Unsupervised Identification of Translationese Ella Rabinovich and Shuly Wintner
"Translated texts are distinctively different from original ones, to the extent that supervised text classification methods can distinguish between them with high accuracy. These differences were proven useful for statistical machine translation. However, it has been suggested that the accuracy of translation detection deteriorates when the classifier is evaluated outside the domain it was trained on. We show that this is indeed the case, in a variety of evaluation scenarios. We then show that unsupervised classification is highly accurate on this task. We suggest a method for determining the correct labels of the clustering outcomes, and then use the labels for voting, improving the accuracy even further. Moreover, we suggest a simple method for clustering in the challenging case of mixed-domain datasets, in spite of the dominance of domain-related features over translation-related ones. The result is an effective, fully-unsupervised method for distinguishing between original and translated texts that can be applied to new domains with reasonable accuracy."
- A Graph-based Lattice Dependency Parser for Joint Morphological Segmentation and Syntactic Analysis Wolfgang Seeker and Özlem Çetinoğlu
Space-delimited words in Turkish and Hebrew text can be further segmented into meaningful units, but syntactic and semantic context is necessary to predict segmentation. At the same time, predicting correct syntactic structures relies on correct segmentation. We present a graph-based lattice dependency parser that operates on morphological lattices to represent different segmentations and morphological analyses for a given input sentence. The lattice parser predicts a dependency tree over a path in the lattice and thus solves the joint task of segmentation, morphological analysis, and syntactic parsing. We conduct experiments on the Turkish and the Hebrew treebank and show that the joint model outperforms three state-of-the-art pipeline systems on both data sets. Our work corroborates findings from constituency lattice parsing for Hebrew and presents the first results for full lattice parsing on Turkish.
- Approximation-Aware Dependency Parsing by Belief Propagation Matthew R. Gormley, Mark Dredze, and Jason Eisner
We show how to train the fast dependency parser of Smith and Eisner (2008) for improved accuracy. This parser can consider higher-order interactions among edges while retaining O(n^3) runtime. It outputs the parse with maximum expected recall—but for speed, this expectation is taken under a posterior distribution that is constructed only approximately, using loopy belief propagation through structured factors. We show how to adjust the model parameters to compensate for the errors introduced by this approximation, by following the gradient of the actual loss on training data. We find the gradient by back-propagation, treating the entire parser (approximations and all) as a differentiable circuit, as Stoyanov et al. (2011) and Domke (2010) did for loopy CRFs. The resulting trained parser obtains higher accuracy with fewer iterations of belief propagation than one trained by conditional log-likelihood.
- Context-aware Frame-Semantic Role Labeling Michael Roth, Mirella Lapata
"Frame semantic representations have been useful in several applications ranging from text-to-scene generation, to question answering and social network analysis. Predicting such representations from raw text is, however, a challenging task and corresponding models are typically only trained on a small set of sentence-level annotations. In this paper, we present a semantic role labeling system that takes into account sentence and discourse context. We introduce several new features which we motivate based on linguistic insights and experimentally demonstrate that they lead to significant improvements over the current state-of-the-art in FrameNet-based semantic role labeling."
- Semantic Proto-Roles Drew Reisinger, Rachel Rudinger, Frank Ferraro, Craig Harman, Kyle Rawlins, Benjamin Van Durme
We present the first large-scale, corpus based verification of the seminal work of Dowty on the notion of thematic proto-roles. Our results demonstrate both the need for and the feasibility of a property-based annotation scheme of semantic relationships, as opposed to the currently dominant notion of categorical roles.