Dr.
XUERUI WANG
Yahoo! Labs,
701 First Avenue, Sunnyavle,
CA 94089
WWW:
http://www.cs.umass.edu/~xuerui
email:
xuerui_wang@yahoo.com
|
RESEARCH INTEREST
TOP
Statistical and computational machine learning (ML), data mining (DM) for large data sets, online advertising, information retrieval (IR), topic models of text,
and social network analysis (SNA). |
EDUCATION
TOP
University of Massachusetts,
Amherst,
Massachusetts
Apr. 2009
Doctor of
Philosophy in Computer Science
Advisor: Andrew McCallum
Carnegie
Mellon University,
Pittsburgh,
Pennsylvania
May 2003
Master of
Science in Knowledge Discovery and Data Mining
Advisor: Tom Micthell
Tsinghua
University,
Beijing, P.
R. China
Jul. 2001
Master of
Engineering in Control Theory and Its Applications
Advisor: Wenhuang Liu
Tsinghua
University,
Beijing, P.
R. China
Jul. 1999
Bachelor of
Engineering in Automation |
EXPERIENCE
TOP
Yahoo! Labs,
Santa Clara,
California,
Feb. 2009 – Present
Scientist, Contextual and Display Advertising
•
Conducting research on non-guaranteed delivery (NGD) traffic forecasting, evaluating model performance and developing system APIs.
•
Studied performance based user click feedback methods to improve ad placement in Yahoo!'s Keystone contextual advertising system, developed evaluation metrics and adopted different exploration and
exploitation strategies.
•
Designed an empirical Bayes framework to smoothen click-through rate (CTR) estimation by leveraging the data hierarchy and temporal continuation in data.
•
Conducting research on forecasting ad performance in the Keystone contextual advertising system via searching page matching in history.
University of Massachusetts,
Amherst,
Massachusetts,
Jun. 2004 – Jan. 2009
Research Assistant, Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities
•
Invented novel undirected topic models with both fast
inference and clear interpretability. Incorporated
information from multiple, heterogeneous modalities is
much more convenient than their directed counterparts.
•
Designed new probabilistic, generative models to
improve role discovery and group discovery in social
networks by taking (textual) attributes of interactions
into account. Applications include email messages,
political voting records and academic literatures.
•
Studied dynamic topic trends in large text collections
in a probabilistic, generative manner with timestamps as
observed random variables. Interesting difference from
traditional Markov transition based dynamic models.
•
Designed new topical n-gram models that discover
topical phrases in context, significantly increased
interpretability compared to the bag-of-words models.
Better performance in IR experiments on large TREC
collections.
•
Invented efficient multi-conditional learning methods
combining generative and
discriminative models, and applying them to various
classification, clustering, and
information retrieval tasks.
Synthesis Project: Inferring Gene
Annotations in Gene Ontology
•
Developed new generative
models to predict Gene Ontology annotations from massive
gene expression data.
Yahoo! Research,
Santa Clara,
California,
May. 2008 – Aug. 2008
Research Intern, Robust Cross-Language Query Classification with external Web Evidence
•
Developed new methodologies to classify non-English queries by first collecting Web evidence in the native language of the original queries, machine translating the evidence into
English, and inferring the queries' class labels from the translated evidence.
•
Experimented a new framework in online advertising to populate products of interest via query rewriting techniques.
Yahoo! Research,
Santa Clara,
California,
Jun. 2007 – Sep. 2007
Research Intern, Search based Forecasting of Ad Volume in Online Advertising
•
Invented a two-level search based method for realtime forecasting the future performance of internet ads based on replaying billions of historical data.
UtopiaCompression Corporation,
Los
Angeles, California, Jun. 2003 –
Jun. 2004
Research
Scientist,
Pattern-Driven Image/Video/Text Compression/Mining
•
Designed and developed an intelligence-based,
pattern-driven image compressor that comprehends an
image as a unified and interrelated entity, instead of
unrelated blocks of data.
•
Developed efficient
statistical machine learning algorithms to extract
features from images and to deal with missing values in
image data.
•
Conducted research on
XML compression using machine learning methods, and
co-wrote accepted proposals for research funding from
governmental organizations such as DoD, NASA and NSF.
Carnegie Mellon University,
Pittsburgh,
Pennsylvania, Aug. 2001 – May 2003
Master Thesis: Scientific Data Mining to Understand Human
Brain Function
•
Developed machine learning methods that can be used to
discover the spatial-temporal fMRI patterns that support
probabilistic predictions about the cognitive state of
the human subject.
•
Discovered representations that are intermediate
between high level cognitive states and the raw fMRI
voxel activities, and designed classifiers that could be
efficiently trained across subjects and across contexts.
Research
Assistant, Multi-Agent Learning
•
Conducted research on multi-agent reinforcement learning
using profit sharing plan that allows agents to learn a
behavior progressively without any instruction and only
with delayed rewards.
Co-Designer, Fly Through The Universe
•
Designed an R-tree-based algorithm to index terabyte
astronomical data sets and digitally simulated a craft
exploring the universe made up of millions of galaxies.
Tsinghua University,
Beijing, P.
R. China, Jun.
1998 – Jul. 2001
Master
Thesis:
Research and Design for Knowledge Management System
•
Formulated a new architecture of knowledge management
systems and designed a web-based knowledge management
system facing multi-source data.
Research
Assistant, HY-CIMS Project
•
Conducted research on new decision support technologies
with data mining / data warehousing. Studied the
experimental infrastructure for distributed database
system based on MySQL. Developed the undergraduate
major-choosing expert system.
Research
Assistant, Web-based Decision Support Systems
•
Designed and
implemented a web-based decision support system using
aOracle, Lotus Notes and MS Visual Basic.
Jiangsu
Huaiyin Factory,
Huaiyin,
Jiangsu, P. R. China, Jun. 1999 – Aug.1999
Chief
Designer, REMONTANT Project
•
Designed and implemented the model-based product
development system, the key part of the REMONTANT
project, using Oracle and MS Visual Basic.
The Clover
Co. Ltd.,
Chongqing,
P. R. China, Jun.
1997 – Aug.1997
Summer Intern,
Electronic Form King
•
Developed the expression evaluation module of
Electronic Form King with MS Visual C++.
|
RECENT HONORS
TOP
•
The Graduate School
Fellowship, University
of Massachusetts, May 2007
•
Finalist, the Microsoft
Research/Live Labs
Graduate Fellowship, Dec. 2006
•
Passed the Ph.D.
Portfolio with distinction at University of
Massachusetts, May 2006
•
The
Best Foundational Paper Award, American Medical
Informatics Association, Nov. 2003
•
The Graduate Fellowship, CALD, SCS, Carnegie Mellon
University, Aug. 2001 & Aug. 2002
•
Rockwell Automation Scholarship, Rockwell International
Corporation, Dec. 2000
•
Graduated with honor from Tsinghua University,
Jul. 1999
• Sequent-Chen Daren Outstanding Student Scholarship,
Hong Kong Sequent Ltd., Dec. 1998
•
Student Social Work Scholarship, Excellent Student
Cadre, Tsinghua University, May 1998
•
Outstanding Student Scholarship, Tsinghua University,
Nov. 1996 & Nov.
1997
•
Social Practice Scholarship (Golden Prize),
Tsinghua University, Oct.
1996
|
PATENTS
TOP
•
Vanja Josifovski, Evgeniy Gabrilovich, Andrei Broder, Bo Pang, and Xuerui Wang, Cross-Lingual Query Classification, Pending,
Application filed in Oct. 2008
•
Vanja Josifovski, Xuerui Wang, Marcus Fontoura, and Andrei Broder, System and Method for Estimating an Amount of Traffic Associated with a Digital Advertisement, Pending,
Application filed in Nov. 2007
|
SERVICES
TOP
•
Program Committee, The 2nd ACM SIGKDD Workshop on Social Network Mining and Analysis (SNA-KDD)
•
Reviewer,
The ACM Transactions on Information Systems (TOIS)
•
Reviewer,
Journal of Machine Learning Research (JMLR)
•
Reviewer,
Information Processing & Management (IPM)
•
Reviewer,
Association for Computational Linguistics (ACL)
•
Reviewer,
International Conference on Machine Learning (ICML)
•
Reviewer,
Uncertainty in Artificial Intelligence (UAI)
•
Reviewer,
American Association for Artificial Intelligence (AAAI)
•
Reviewer, Neural
Information Processing Systems (NIPS)
•
Reviewer, ACM
Special Interest Group on Knowledge Discovery and Data
Mining (SIGKDD)
•
Reviewer, ACM
Conference on Information and Knowledge Management (CIKM)
•
Graduate
Representative, Department of Computer Science,
University of Massachusetts
•
Librarian of
Graduate Student Library, Department of Computer
Science, University of Massachusetts |
SKILLS
TOP
•
Programming languages: C/C++, Java, Hadoop, Pig Latin, Matlab,
Perl, Python, Mathematica, Splus, R, SQL, Assembly Language (Intel X86
series), Fortran 77, Pascal, Visual Basic, CLISP, Prolog, HTML, etc.
•
Systems: Windows 95/98/NT/ME/2000/XP/Vista,
Macintosh,
Unix (Linux
especially).
•
Languages: Chinese (native), English (fluent), Japanese
(fair) and German (basic). |
COURSES
TOP
•
Machine Learning, Statistical Approaches to Learning
and Discovery, Multimedia Databases and Data Mining,
Graduate Algorithms, Computational Analyses of Brain
Imaging, Information Retrieval, Advanced Software
Engineering, Theory of Computation, Bioinformatics, etc.
•
Probability and Statistics, Intermediate Statistics,
Statistical Computing, Time Series Analysis,
Nonparametric Methods, etc. |
|
|
|