|
Zhenzhen
Kou
Yahoo! Search Sciences 2821 Mission College Blvd, Santa Clara, CA 95054 E-mail: zzkou AT yahoo-inc DOT com |
|
I am now with Search Sciences
Department at Yahoo! as a Relevance Scientist. My current project is machine
learning for ranking. |
||
Research at CMU:
·
Interests:
Machine Learning, Information Extraction/Retrieval, data mining ·
Advisor:
William W. Cohen and Robert F. Murphy ·
Minorthird: software for text
learning, classification, extraction and annotations ·
SLIF: Subcellular Location
Image Finder ·
CALO: Cognitive
Assistant that Learns and Organizes Thesis:
My thesis, stacked graphical learning,
is a statistical learning model for collective inference over relational
data. The most important feature of stacked graphical learning is that it is very
efficient than the existing models and thus very competitive in applications.
I have applied the idea of my thesis to document classifications, and named
entity extraction. Also I have applied it to some inter-related subtasks in a
complex information extraction system. Projects:
·
Who Rated What I worked with Yan Liu to develop on a link
prediction model for movie recommendation, which ranks 3rd(second
runner-up) in the KDD Cup 07. Please check out our
paper for details. ·
Stacked Graphical Learning package in
Minorthird I designed and
implemented the Stacked Graphical Learning package in minorthird for
classification on relational dataset. Stacked Graphical Learning is an
efficient and effective statistical model for collective classification. Please find more
about the model in our SDM07 paper.
Here is a tutorial to the package. ·
Protein name extractors I developed
several protein name extractors, including a protein name extractor trained
with conditional random fields (CRFs) (download) and
an extractor trained with dictionary hidden Markov models (Dictionary-HMM, download).
Dictionary-HMM combines a dictionary with a Markov model to do soft match and
extract names from free text. Please find more
details about the algorithm of Dictionary-HMM in our ISMB05 paper.
Here is how to use the extractors. ·
SLIF I also did projects on Optical Character Recognition (bioKDD03),
designed and implemented a web interface to an SQL database(KSCE-2004).
Please check out our SLIF
webpage. ·
A
tool for protein name annotation I modified the labeling package in Minorthird and here is a labeling tool for protein name
annotation. Please find the tutorial here
on how to use the labeling tool. Resume:
·
Curriculum Vitae [HTML] Publications
·
Yan Liu, Zhenzhen Kou, Claudia Perlich and Richard
Lawrence (2008): Intelligent System for Workforce Classification, in KDD 2008 Workshop on Data
Mining for Business Applications. ·
Zhenzhen Kou, Vitor R. Carvalho
and William W.
Cohen (2007): Online
Stacked Graphical Learning, to in NIPS 2007 Workshop on Efficient Machine
Learning. ·
Yan
Liu and Zhenzhen Kou (2007): Predicting Who
Rated What in Large-Scale Datasets, in Proceedings of KDD Cup and
Workshop 2007 ·
Zhenzhen
Kou and William W. Cohen (2007): Notes for
Stacked Graphical Models for Effcient Inference in Markov Random Fields
Technical Report: CMU-ML-07-101. ·
Zhenzhen
Kou and William W. Cohen (2007): Stacked Graphical
Models for Effcient Inference in Markov Random Fields in SDM 07. ·
Zhenzhen
Kou, William W. Cohen & Robert F. Murphy (2007): A Stacked Graphical
Model for Associating Information from Text And Images In Figures in
PSB07. ·
Zhenzhen
Kou, William W. Cohen & Robert F. Murphy (2005): High-Recall
Protein Entity Recognition Using a Dictionary in ISMB-2005. ·
R.
Murphy, Z. Kou, J. Hua, M. Joffe, W. W. Cohen
(2005): Extracting Structured Information
from Text and Images in On-line Journal Articles for Localization Proteomics,
in Biolink05. ·
Robert
F. Murphy, Zhenzhen Kou, Juchang Hua, Matthew Joffe, William W. Cohen (2004):
Extracting
and Structuring Subcellular Location Information from On-line Journal
Articles: The Subcellular Location Image Finder in KSCE-2004. ·
William
W. Cohen, Zhenzhen Kou & Robert F. Murphy (2003): Extracting
Information from Text and Images for Location Proteomics in BIOKDD 2003:
2-9. ·
Zhenzhen
Kou, Liang Ji and Xuegong Zhang(2001), Karyotyping of
CGH human metaphase by using support vector machines, Cytometry, December
2001. ·
Zhenzhen
Kou, Jianhua Xu, Xuegong Zhang and Liang Ji(2001), An Improved
Support Vector Machine Using Class-Median Vectors, in proceedings of 8th
International Conference on Neural Information Processing, 2001, Shanghai,
China, pp883-887. |