Home

Short Bio | Resume | Research | Photo Album | Favorite Links | Contact
Xuerui Wang's Home Page
Resume
Printer Friendly Version: PDF

Research Interest | Education | Experience | Honors | Patents | Services | Skills | Courses

Dr. XUERUI WANG

Yahoo! Labs, 701 First Avenue, Sunnyavle, CA 94089
WWW: http://www.cs.umass.edu/~xuerui
email: xuerui_wang@yahoo.com

RESEARCH INTEREST TOP

Statistical and computational machine learning (ML), data mining (DM) for large data sets, online advertising, information retrieval (IR), topic models of text, and social network analysis (SNA).

EDUCATION TOP

University of Massachusetts, Amherst, Massachusetts                                           Apr. 2009
Doctor of Philosophy in Computer Science 
Advisor: Andrew McCallum

Carnegie Mellon University, Pittsburgh, Pennsylvania                                             May 2003
Master of Science in Knowledge Discovery and Data Mining 
Advisor: Tom Micthell

Tsinghua University, Beijing, P. R. China                                                                   Jul. 2001
Master of Engineering in Control Theory and Its Applications 
Advisor: Wenhuang Liu

Tsinghua University, Beijing, P. R. China                                                                   Jul. 1999
Bachelor of Engineering in Automation

EXPERIENCE TOP

Yahoo! Labs, Santa Clara, California, Feb. 2009 – Present
Scientist, Contextual and Display Advertising
Conducting research on non-guaranteed delivery (NGD) traffic forecasting, evaluating model performance and developing system APIs.
Studied performance based user click feedback methods to improve ad placement in Yahoo!'s Keystone contextual advertising system, developed evaluation metrics and adopted different exploration and exploitation strategies.
Designed an empirical Bayes framework to smoothen click-through rate (CTR) estimation by leveraging the data hierarchy and temporal continuation in data.
Conducting research on forecasting ad performance in the Keystone contextual advertising system via searching page matching in history.

University of Massachusetts, Amherst, Massachusetts, Jun. 2004 – Jan. 2009
Research Assistant, Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities
Invented novel undirected topic models with both fast inference and clear interpretability. Incorporated information from multiple, heterogeneous modalities is much more convenient than their directed counterparts.
Designed new probabilistic, generative models to improve role discovery and group discovery in social networks by taking (textual) attributes of interactions into account. Applications include email messages, political voting records and academic literatures.
Studied dynamic topic trends in large text collections in a probabilistic, generative manner with timestamps as observed random variables. Interesting difference from traditional Markov transition based dynamic models.
Designed new topical n-gram models that discover topical phrases in context, significantly increased interpretability compared to the bag-of-words models. Better performance in IR experiments on large TREC collections.
Invented efficient multi-conditional learning methods combining generative and discriminative models, and applying them to various classification, clustering, and information retrieval tasks.

Synthesis Project: Inferring Gene Annotations in Gene Ontology
Developed new generative models to predict Gene Ontology annotations from massive gene expression data.

Yahoo! Research, Santa Clara, California, May. 2008 – Aug. 2008
Research Intern, Robust Cross-Language Query Classification with external Web Evidence
Developed new methodologies to classify non-English queries by first collecting Web evidence in the native language of the original queries, machine translating the evidence into English, and inferring the queries' class labels from the translated evidence.
Experimented a new framework in online advertising to populate products of interest via query rewriting techniques.

Yahoo! Research, Santa Clara, California, Jun. 2007 – Sep. 2007
Research Intern, Search based Forecasting of Ad Volume in Online Advertising
Invented a two-level search based method for realtime forecasting the future performance of internet ads based on replaying billions of historical data.

UtopiaCompression Corporation, Los Angeles, California, Jun. 2003 – Jun. 2004
Research Scientist, Pattern-Driven Image/Video/Text Compression/Mining
Designed and developed an intelligence-based, pattern-driven image compressor that comprehends an image as a unified and interrelated entity, instead of unrelated blocks of data.
Developed efficient statistical machine learning algorithms to extract features from images and to deal with missing values in image data.
Conducted research on XML compression using machine learning methods, and co-wrote accepted proposals for research funding from governmental organizations such as DoD, NASA and NSF.

Carnegie Mellon University, Pittsburgh, Pennsylvania, Aug. 2001 – May 2003
Master Thesis: Scientific Data Mining to Understand Human Brain Function
Developed machine learning methods that can be used to discover the spatial-temporal fMRI patterns that support probabilistic predictions about the cognitive state of the human subject.
Discovered representations that are intermediate between high level cognitive states and the raw fMRI voxel activities, and designed classifiers that could be efficiently trained across subjects and across contexts.

Research Assistant, Multi-Agent Learning
Conducted research on multi-agent reinforcement learning using profit sharing plan that allows agents to learn a behavior progressively without any instruction and only with delayed rewards.

Co-Designer, Fly Through The Universe
Designed an R-tree-based algorithm to index terabyte astronomical data sets and digitally simulated a craft exploring the universe made up of millions of galaxies.

Tsinghua University, Beijing, P. R. China, Jun. 1998 – Jul. 2001
Master Thesis: Research and Design for Knowledge Management System
Formulated a new architecture of knowledge management systems and designed a web-based knowledge management system facing multi-source data.

Research Assistant, HY-CIMS Project
Conducted research on new decision support technologies with data mining / data warehousing. Studied the experimental infrastructure for distributed database system based on MySQL. Developed the undergraduate major-choosing expert system.

Research Assistant, Web-based Decision Support Systems
Designed and implemented a web-based decision support system using aOracle, Lotus Notes and MS Visual Basic.

Jiangsu Huaiyin Factory, Huaiyin, Jiangsu, P. R. China, Jun. 1999 – Aug.1999
Chief Designer, REMONTANT Project
Designed and implemented the model-based product development system, the key part of the REMONTANT project, using Oracle and MS Visual Basic.

The Clover Co. Ltd., Chongqing, P. R. China, Jun. 1997 – Aug.1997
Summer Intern, Electronic Form King
Developed the expression evaluation module of Electronic Form King with MS Visual C++.

RECENT HONORS TOP

The Graduate School Fellowship, University of Massachusetts, May 2007
Finalist, the Microsoft Research/Live Labs Graduate Fellowship, Dec. 2006
Passed the Ph.D. Portfolio with distinction at University of Massachusetts, May 2006
The Best Foundational Paper Award, American Medical Informatics Association, Nov. 2003
The Graduate Fellowship, CALD, SCS, Carnegie Mellon University, Aug. 2001 & Aug. 2002
Rockwell Automation Scholarship, Rockwell International Corporation, Dec. 2000
Graduated with honor from Tsinghua University, Jul. 1999
Sequent-Chen Daren Outstanding Student Scholarship, Hong Kong Sequent Ltd., Dec. 1998
Student Social Work Scholarship, Excellent Student Cadre, Tsinghua University, May 1998
Outstanding Student Scholarship, Tsinghua University, Nov. 1996 & Nov. 1997
Social Practice Scholarship (Golden Prize), Tsinghua University, Oct. 1996

PATENTS TOP

Vanja Josifovski, Evgeniy Gabrilovich, Andrei Broder, Bo Pang, and Xuerui Wang, Cross-Lingual Query Classification, Pending, Application filed in Oct. 2008
Vanja Josifovski, Xuerui Wang, Marcus Fontoura, and Andrei Broder, System and Method for Estimating an Amount of Traffic Associated with a Digital Advertisement, Pending, Application filed in Nov. 2007

SERVICES TOP

Program Committee, The 2nd ACM SIGKDD Workshop on Social Network Mining and Analysis (SNA-KDD)
Reviewer, The ACM Transactions on Information Systems (TOIS)
Reviewer, Journal of Machine Learning Research (JMLR)
Reviewer, Information Processing & Management (IPM)
Reviewer, Association for Computational Linguistics (ACL)
Reviewer, International Conference on Machine Learning (ICML)
Reviewer, Uncertainty in Artificial Intelligence (UAI)
Reviewer, American Association for Artificial Intelligence (AAAI)
Reviewer, Neural Information Processing Systems (NIPS)
Reviewer, ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD)
Reviewer, ACM Conference on Information and Knowledge Management (CIKM)
Graduate Representative, Department of Computer Science, University of Massachusetts
Librarian of Graduate Student Library, Department of Computer Science, University of Massachusetts

SKILLS TOP

Programming languages: C/C++, Java, Hadoop, Pig Latin, Matlab, Perl, Python, Mathematica, Splus, R, SQL, Assembly Language (Intel X86 series), Fortran 77, Pascal, Visual Basic, CLISP, Prolog, HTML, etc.
Systems: Windows 95/98/NT/ME/2000/XP/Vista, Macintosh, Unix (Linux especially).
Languages: Chinese (native), English (fluent), Japanese (fair) and German (basic).

COURSES TOP

Machine Learning, Statistical Approaches to Learning and Discovery, Multimedia Databases and Data Mining, Graduate Algorithms, Computational Analyses of Brain Imaging, Information Retrieval, Advanced Software Engineering, Theory of Computation, Bioinformatics, etc.
Probability and Statistics, Intermediate Statistics, Statistical Computing, Time Series Analysis, Nonparametric Methods, etc.

 

Last updated on September 20, 2005