DI XU
5000 Forbes Ave. GHC 5404, Pittsburgh, PA 15213
(765) 409-9707
dix@cs.cmu.edu
OBJECTIVE
To advance in the field of Information Retrieval and Text Mining
To get equipped with Machine Learning skills and insights
To get practices in Software Engineering
Information Retrieval, Question Answering, Text Mining, Spoken Term Detection and Machine Learning
EDUCATION
Carnegie Mellon University, Pittsburgh, PA |
August 2013 每August 2015 (Expected) |
Master of Language Technologies, School of Computer Science Graduate Research Fellowship GPA: 3.89/4.0 Advisor: Prof. Florian Metze |
Purdue University, West Lafayette, IN |
August 2009 每 May 2013 |
1st Bachelor of Science, Computer Science; 2nd Bachelor of Science, Statistics GPA: 3.93/4.0 (with distinction) Major Concentration: Machine Intelligence and Software Engineering Dean*s List since Admission |
RESEARCH EXPERIENCE
Graduate Research Assistant August 2014 每 Present |
Language Technologies Institute, School of Computer Science Carnegie Mellon University, Pittsburgh, PA |
Project: The IARPA Aladdin Video Program Supervisor: Dr. Alex Hauptmann w Experimented with multiple learning to rank algorithms for MED late fusion; worked on Multi-media Event Recounting and Summarization |
Independent Research Summer 2014 |
Language Technologies Institute, School of Computer Science Carnegie Mellon University, Pittsburgh, PA |
Project: TREC 2014 每 Web Search (Ad hoc) Supervisor: Prof. Jamie Callan w Performed extensive studies on well-known retrieval models; explored topic-modeling based pseudo-relevance feedback and query expansion approaches; explored multiple Learning to Rank and data fusion techniques |
Independent Research Summer 2014 |
Language Technologies Institute, School of Computer Science Carnegie Mellon University, Pittsburgh, PA |
Project: TREC 2014 每 Contextual Suggestion Track Supervisor: Prof. Jamie Callan w Performed large scale web crawling and mining, with Google, Yelp and Wikipedia APIs; implemented intelligent user preference models with various text mining methods; developed a large scale intelligent information system |
Graduate Research Assistant August 2013 每 August 2014 |
Language Technologies Institute, School of Computer Science Carnegie Mellon University, Pittsburgh, PA |
Project: The IARPA Babel Program Supervisor: Prof. Florian Metze w Developed and published the word-based Probabilistic Phonetic Retrieval model for spoken term detection on low resource languages; implemented tools for significance tests; performed system fusions; coordinated and prepared final submissions for IARPA Babel OP1 evaluation |
Volunteered Research Assistant August 2012 每 Spring 2014 |
Department of Computer Science University of Illinois at Urbana Champaign, Urbana, IL |
Project: Consistent Language Model for Keyword Search over Unstructured Documents Supervisor: Prof. Marianne Winslett, Dr. Arash Termenchy w Implemented state of the art smoothing methods; performed extensive studies to evaluate the effectiveness of multiple language modeling methods; designed novel modeling methods in vector space |
Undergraduate Research Assistant May 2012 每 May 2013 |
Department of Statistics Purdue University, West Lafayette, IN |
Project: Pattern Mining over Time Series and Drought Detection Supervisor: Prof. Sergey Kirshner w Implemented Hidden Markov Models and the Viterbi Algorithms; developed a geographical and meteorological plotter of specified area; conducted extensive training and testing on different models |
Undergraduate Research Intern May 2012 每 August 2012 |
Information Trust Institute University of Illinois at Urbana Champaign, Urbana, IL |
Project: Principled and Optimal Language Model for Keyword Search over Structured Documents Supervisor: Prof. Marianne Winslett w Developed novel and effective search algorithms for keyword queries on semi-structured data; implemented multiple statistical language modeling based retrieval models; developed novel smoothing techniques; performed extensive studies to evaluate the effectiveness of various language modeling approaches |
Undergraduate Research Assistant May 2010 每 December 2011 |
Department of Computer Science Purdue University , West Lafayette, IN |
Project: Loop-level Data Dependence Profiling and Multicore Processing Supervisor: Prof. Zhiyuan Li w Developed a framework for testing thread based parallel C programs; analyzed loop-level data dependencies of SPEC CPU2000 benchmark 197.parser and developed its thread based parallel version using OpenMP. |
TEACHING EXPERIENCE
Purdue University, West Lafayette, IN
w C Programming Applications for Engineers (Spring 2011)
w Programming with Multimedia Objects (Summer 2011 , Fall 2012)
w Introduction to Computers (Fall 2011)
PUBLICATIONS
[1] A. Ge, D. Xu and L. Yang. A Novel Maximum Power Point Tracking Method under Non-uniform Insolation Conditions. Infrared and Laser Engineering, No. 6 Volume 42, 2013. ISSN: 1007-2276 CN 12-1261/TN;
[2] D. Xu and F. Metze. Word-based probabilistic phonetic retrieval for low-resource spoken term detection. In 15th Annual Conference of the International Speech Communication Association (ISCA). INTERSPEECH 2014.
[3] D. Xu, Y. Wang, and F. Metze. EM-based phoneme confusion matrix generation for low-resource spoken term detection. In Spoken Language Technology (SLT). IEEE, 2014.
[4] D. Xu, and J. Callan. Modelling Psychological Needs for User-dependent Contextual Suggestion. In Proceedings of the Twenty-Third Text REtrieval Conference (TREC 2014). NIST, to appear.
[5] D. Xu, and J. Callan. Towards a simple and efficient web search framework. In Proceedings of the Twenty-Third Text REtrieval Conference (TREC 2014). NIST, to appear.
PROFESSIONAL SKILLS
Java, C, C++, Python, bash, LaTex
Lucene, Indri, UIMA, Maven, Git, OpenFST, OpenMP/MPI, SQL, Google Web App, Android
PROFESSIONAL ACTIVITIES
IEEEXTREME Programming Competition |
Purdue University, Fall 2010 |
ACM-ICPC Regional Contest |
University of Cincinnati, Fall 2011 |