 |
Wang Ling
PhD Student
lingwang@cs.cmu.edu
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
Spoken Language Systems Lab, INESC-ID
Instituto Superior Tecnico
|
I am a Phd student in the Dual Degree Carnegie Mellon Portugal PhD Program, between Carnegie Mellon University and Instituto Superior Tecnico. Currently, I am working in the Language Technologies Institute in Carnegie Mellon University. It is my privilage to be working with my advisors Alan Black (LTI), Chris Dyer (LTI) and Isabel Trancoso (INESC-ID), to whom I hold my deepest respect.
I am interested in applying statistics and machine learning methods in Natural Language Processing tasks. Currently, my work addresses following problems: (1) Machine Translation in Microblogs, such as Twitter and Facebook . (2) Character-based Neural Networks for Natural Language Processing and Machine Translation.
|
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation - Wang Ling, Tiago Luis, Luis Marujo, Ramon Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W Black, Isabel Trancoso, In Proceedings of the 2015 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ,Lisbon, Portugal, September 2015 [pdf] [bib] Not All Contexts Are Created Equal: Better Word Representations with Variable Attention - Wang Ling, Lin Chu-Cheng, Yulia Tsvetkov, Silvio Amir, Ramon Fernandez Astudillo, Chris Dyer, Alan W Black, Isabel Trancoso, In Proceedings of the 2015 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ,Lisbon, Portugal, September 2015 [pdf] [bib] Evaluation of Word Vector Representations by Subspace Alignment - Yulia Tsvetkov, Mannal Faruqui, Wang Ling, Guillaume Lample, Chris Dyer, In Proceedings of the 2015 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ,Lisbon, Portugal, September 2015 [pdf] [bib] Transition-Based Dependency Parsing with Stack Long Short-Term Memory - Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith, In The 53th Annual Meeting of the Association for Computational Linguistics (ACL) 2015 ,Beijing, China, July 2015 [pdf] [bib] Learning Word Representations from Scarce and Noisy Data with Embedding Sub-spaces - Ramon F. Astudillo, Silvio Amir, Wang Ling, Mario Silva, sabel Trancoso, In The 53th Annual Meeting of the Association for Computational Linguistics (ACL) 2015 ,Beijing, China, July 2015 [pdf] [bib] Automatic Keyword Extraction on Twitter - Luis Marujo, Wang Ling, Isabel Trancoso, Chris Dyer, Alan W Black, Anatole Gershman, David Martins De Matos, Joao Paulo Neto, Jaime Carbonell, In The 53th Annual Meeting of the Association for Computational Linguistics (ACL) 2015 ,Beijing, China, July 2015 [pdf] [bib] Privacy-Preserving Multi-document Summarization - Luis Marujo, Jose Portelo, Wang Ling, David Martins De Matos, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha Raj, In PIR15: Privacy-Preserving IR (SIGIR 2015 Workshop ,Santiago, Chile, August 2015 [pdf] [bib] INESC-ID: Sentiment Analysis without hand-coded Features or Liguistic Resources using Embedding Subspaces - Ramon Astudillo, Silvio Amir, Wang Ling, Bruno Martins, Mario Silva, Isabel Trancoso, In 9th International Workshop on Semantic Evaluation (SemEval 2015) ,Denver, USA, June 2015 [pdf] [bib] INESC-ID: A Regression Model for Large Scale Twitter Sentiment Lexicon Induction - Ramon Astudillo, Silvio Amir, Wang Ling, Bruno Martins, Mario Silva, Isabel Trancoso, In 9th International Workshop on Semantic Evaluation (SemEval 2015) ,Denver, USA, June 2015 [pdf] [bib] Two/Too Simple Adaptations of Word2Vec for Syntax Problems - Wang Ling, Chris Dyer, Alan W Black, Isabel Trancoso, In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ,Denver, USA, June 2015 [pdf] [bib] [poster] A linguistically motivated taxonomy for Machine Translation error analysis - Ângela Costa, Wang Ling, Tiago Luís, Rui Correia, Luísa Coheur, In Machine Translation [bib]
Crowdsourcing High-Quality Parallel Data Extraction from Twitter - Wang Ling, Luis Marujo, Chris Dyer, Alan W Black, Isabel Trancoso, In Proceedings of the Ninth Workshop on Statistical Machine Translation ,Baltimore, USA, July 2014 [pdf] [bib] [dataset] Dual Subtitles as Parallel Corpora - Shikun Zhang, Wang Ling, Chris Dyer, In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'14) ,Reykjavik, Iceland, May 2014 [pdf] [bib] [dataset] [poster] Linguistic Evaluation Of Support Verb Constructions By OpenLogos And Google Translate - Anabela Barreiro, Johanna Monti, Brigitte Orliac, Susanne Preuβ, Kutz Arrieta, Wang Ling, Fernando Batista, Isabel Trancoso, In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'14) ,Reykjavik, Iceland, May 2014 [pdf] [bib]
Paraphrasing 4 Microblog Normalization - Wang Ling, Chris Dyer, Alan W Black, Isabel Trancoso, In Proceedings of the 2013 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ,Seattle, USA, October 2013 [pdf] [bib] [slides] The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References - Waleed Ammar, Victor Chahuneau, Michael Denkowski, Greg Hanneman, Wang Ling, Austin Matthews, Kenton Murray, Nicola Segall, Yulia Tsvetkov, Alon Lavie, Chris Dyer, In Proceedings of the Eighth Workshop on Machine Translation ,Sofia, Bulgaria, August 2013 [pdf] [bib] Microblogs as Parallel Corpora - Wang Ling, Guang Xiang, Chris Dyer, Alan Black, Isabel Trancoso, In The 51th Annual Meeting of the Association for Computational Linguistics (ACL) 2013 ,Sofia, Bulgaria, August 2013 [pdf] [bib] [slides] [dataset]
Improving Relative-Entropy Pruning using Statistical Significance - Wang Ling, Nadi Tomeh, Guang Xiang, Alan Black, Isabel Trancoso, In Proceedings of the 25th International Conference on Computational Linguistics (Coling 2012) ,Mumbai, India, December 2012 [pdf] [bib] Recognition of Named-Event Passages in News Articles - Luis Marujo, Wang Ling, Anatole Gershman, Jaime Carbonell, João P. Neto, David Matos, In Proceedings of the 25th International Conference on Computational Linguistics (Coling 2012) ,Mumbai, India, December 2012 [pdf] [bib] Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus - Guang Xiang, Bin Fan, Wang Ling, Jason I.Hong, Carolyn P. Rose, In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM'12) ,Hawaii, USA, October 2012 [pdf] [bib] Entropy-based Pruning for Phrase-based Machine Translation - Wang Ling, João Graça, Isabel Trancoso, Alan Black, In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ,Jeju Island, Korea, July 2012 [pdf] [bib] Overview of Computer-assisted Language Learning for European Portuguese at L2f - Thomas Pellegrini, Wang Ling, André Silva, Rui Correia, Isabel Trancoso, Jorge Baptista, Nuno J. Mamede, In Proceedings of the International Conference on Computer Supported Education ,Porto, Portugal, April 2012 [pdf] [bib]
Named Entity Translation using Anchor Texts - Wang Ling, Pável Calado, Bruno Martins, Isabel Trancoso, Alan Black, Luísa Coheur, In International Workshop on Spoken Language Translation (IWSLT) ,San Francisco, USA, December 2011 [pdf] [bib] Discriminative Phrase-based Lexicalized Reordering Models using Weighted Reordering Graphs - Wang Ling, João Graça, David Martins de Matos, Isabel Trancoso, Alan Black, In The 5th International Joint Conference on Natural Language Processing (IJCNLP2011) ,Chiang Mai, Thailand, November 2011 [pdf] [bib] An Agent Based Competitive Translation Game for Second Language Learning - Wang Ling, Rui Prada, Isabel Trancoso, In The ISCA on Speech and Language Technology in Education (SLaTE) 2011 ,Venice, Italy, August 2011 [pdf] [bib] Reordering Modeling using Weighted Alignment Matrices - Wang Ling, Tiago Luís, João Graça, Isabel Trancoso, Luísa Coheur, In The 49th Annual Meeting of the Association for Computational Linguistics (ACL) 2011 ,Portland, Oregon, USA, June 2011 [pdf] [bib] BP2EP - Adaptation of Brazilian Portuguese texts to European Portuguese - Luis Marujo, Nuno Grazina, Wang Ling, Luísa Coheur, Isabel Trancoso, In The 15th Conference of the European Association for Machine Translation, European Association for Machine Translation ,Leuven, Belgium, May 2011 [pdf] [bib]
The INESC-ID Machine Translation System for the IWSLT 2010 - Wang Ling, Tiago Luís, João Graça, Luísa Coheur, Isabel Trancoso, In International Workshop on Spoken Language Translation ,Paris, France, December 2010 [pdf] [bib] Towards a General and Extensible Phrase-Extraction Algorithm - Wang Ling, Tiago Luís, João Graça, Luísa Coheur, Isabel Trancoso, In International Workshop on Spoken Language Translation ,Paris, France, December 2010 [pdf] [bib]
JNN - Java Neural Network Toolkit This is a JAVA toolkit for building neural networks. You can build any architecture with it from scratch or use any prebuilt architectures (MLP, RNN, LSTM, BLSTM, Conv->Maxpool)
Wang2Vec - Word2Vec architecture adaptations Extension of the original word2vec (https://code.google.com/p/word2vec/) using different architectures
This resource was created or used in: 1-Two/Too Simple Adaptations of Word2Vec for Syntax Problems 2-INESC-ID: A Regression Model for Large Scale Twitter Sentiment Lexicon Induction 3-INESC-ID: Sentiment Analysis without hand-coded Features or Liguistic Resources using Embedding Subspaces
RelEnt Pruner - Relative Entropy-based Phrase Table Pruner Phrase table pruning toolkit where phrase tables can be made more compact by discarding entries. In Entropy based pruning we focus on removing phrase pairs that are redundant, that is, phrase pairs that can be produced using smaller phrase pairs in the model. We also implement significance pruning (Johnson et al. 2007) in this resource (credit to Chris Dyer for the code). The code can be found in moses/contrib/relent-filter.
This resource was created or used in: 1-Entropy-based Pruning for Phrase-based Machine Translation 2-Improving Relative-Entropy Pruning using Statistical Significance
Geppetto - General Extensible Phrase Extraction Toolkit Phrase extraction toolkit where I implemented several of my papers, and some other interesting work. Produces a translation model from bilingual data, in the format used in MOSES. In its most basic form, it produces the same results as the extraction step in the moses pipeline, but the user is able to obtain better results if all the features are used. The documentation might be a little old, so please feel free to mail me if you run into any problems.
This resource was created or used in: 1-Towards a General and Extensible Phrase-Extraction Algorithm 2-The INESC-ID Machine Translation System for the IWSLT 2010 3-Reordering Modeling using Weighted Alignment Matrices 4-Discriminative Phrase-based Lexicalized Reordering Models using Weighted Reordering Graphs
Dual Subtitles - Mandarin-English Subtitles Parallel Corpus Parallel Data Extracted From Dual Subtitles (subtitles with two languages).
This resource was created or used in: 1-Dual Subtitles as Parallel Corpora
μtopia - Microblog Translated Post Parallel Corpora Parallel Corpora Extracted from Microblogs (Twitter and Sina Weibo).
This resource was created or used in: 1-Microblogs as Parallel Corpora
BP2EP - Brazilian Portuguese to European Portuguese Resource Brazilian Portuguese converted to European Portuguese with other resources, such as, automatically built lexicons and phrase pairs built to automatically convert from Brazilian Portuguese to European Portuguese.
This resource was created or used in: 1-BP2EP - Adaptation of Brazilian Portuguese texts to European Portuguese
Normalization Interface - Normalization Interface for Annotators Interface to normalize sentences, which will record the alignments between the source words or phrases and their respective normalizations. Annotators can also define what type of normalizations they wish to perform (Orthographic error corrects, segmentation etc...). By the way, we are *not* collecting the data as you use the interface. But if you can and feel generous, I would appreciate if you send it to me, so that I can improve my normalizer :p
Language ID 2.0 - Fine Grained Language Identification Annotator Just a small interface I put together for the CMU NLP group mini-blitz for fine grained language Id. The goal was allow segment level annotations rather than traditional document-level language identification. Feel free to use it.
A brief note about about my talks: I will happily share my slides for educational purposes, so use feel free to use them. However, I tend to put very little textual information in my slides, since I personally find it more entertaining and understandable to look at illustrations, while listening to the speaker. So please keep in mind that the slides themselves might not contain the complete representation of the work they describe. Also, I would be happy if an acknowledgment is given when using any of my material. Thanks!
|