Dayne Freitag | http://www.cs.cmu.edu/~dayne
|
February, 2000 - Present | Principal Scientist & VP, Technology at Burning Glass Technologies |
---|---|
Led research and development of the extraction engine at the core of
the company's most successful product, a resume parsing system. The
engine, which converts plaintext resumes to an XML schema
identifying approximately 70 different contexts, was trained on 4000
hand-labeled resumes. Primarily a statistical system using a hidden
Markov model, it also incorporates symbolic machine learning methods
and discrete finite-state techniques common in traditional
information extraction. Work on this engine led to several
innovations, including a patented mechanism, called re-entry
penalization, which mitigates some of the problems caused by the
Markov assumption.
Wrote the indexing and retrieval engine upon which the Boolean query processor in the company's resume corpus management product is based. The context operators supported by this engine, combined with the granular structure recoved by the extraction engine, facilitates precise searching of resume collections. Spearheaded development of the company's service offering, called Aperture, a resume management system for emailed and faxed resumes. Aperture accepts emailed resumes, screens out spam and duplicate submissions, separates the resume from the message containing it, performs extraction, scores the resume against the posting of the job for which it was submitted, and sends back the resume in a special fixed format designed to support easy ranking and review in the client's mail reader. | |
November, 1998 - February 2000 | Research Scientist at Just Research |
Headed research efforts in information extraction and text mining. Applied machine learning and statistical techniques to the problem of information extraction from text. Research integrated a variety of text dimensions, such as term co-occurrence, document formatting, and linguistic structure in a machine learning framework. Investigated the prospect of high-precision text retrieval using information extraction. |
Machine learning for information extraction
| |
Information retrieval
| |
Text classification and user interest modeling
| |
Machine learning in hypertext
| |
Relevance and feature selection
|
D. Freitag, "Trained Named Entity Recognition Using Distributional Clusters," Proceedings of EMNLP 2004.
A. McCallum, D. Freitag, and F. Pereira, "Maximum entropy Markov models for information extraction and segmentation," Proceedings of ICML-2000.
D. Freitag and N. Kushmerick, "Boosted wrapper induction," Proceedings of AAAI-2000.
D. Freitag and A. McCallum, "Information extraction with HMM structures learned by stochastic optimization," Proceedings of AAAI-2000.
A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal, "Bridging the lexical chasm: statistical approaches to answer-finding," Proceedings of SIGIR-2000.
D. Freitag and A. McCallum, "Information extraction using HMMs and shrinkage," Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction.
D. Freitag, "Machine Learning for Information Extraction in Informal Domains," PhD. dissertation, November, 1998.
D. Freitag, "Multistrategy learning for information extraction," ICML-98.
D. Freitag, "Information extraction from HTML: application of a general machine learning approach," AAAI-98.
D. Freitag, "Using grammatical inference to improve precision in information extraction," ICML-97 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition, Nashville, July, 1997.
T. Joachims, D. Freitag, and T. Mitchell, "WebWatcher: A Tour Guide for the World Wide Web," Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97).
J. Boyan, D. Freitag, and T. Joachims, "A Machine Learning Architecture for Optimizing Web Search Engines," AAAI-96 Workshop on Internet-based Information Systems, Portland, August 1996.
D. Freitag, T. Joachims, and T. Mitchell, "WebWatcher: Knowledge Navigation in the World Wide Web," 1995 AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, Boston, November 1995.
T. Joachims, T. Mitchell, D. Freitag, and R. Armstrong, "WebWatcher: Machine Learning and Hypertext," Fachgruppentreffen Maschinelles Lernen, Dortmund, Germany, August 1995.
R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell, "WebWatcher: A Learning Apprentice for the World Wide Web," 1995 AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, Stanford, March 1995.
R. Caruana and D. Freitag, "How Useful is Relevance?" 1994 AAAI Fall Symposium on Relevance, New Orleans, 1994.
T. Mitchell, R. Caruana, D. Freitag, J. McDermott, and D. Zabowski, "Experience with a Learning Personal Assistant," Communications of the ACM, July, 1994.
R. Caruana and D. Freitag, "Greedy Attribute Selection," Proceedings of the 11th International Conference on Machine Learning, 1994.
1996-present | World Wide Web Knowledge Base Project |
---|---|
with Tom Mitchell (PI), Jaime Carbonell (PI), Mark Craven, Andrew McCallum, and Kamal Nigam | |
Most research to date on representing information contained in the Web either has required manual annotation of Web pages with symbolic categories, or has contented itself with TFIDF-style models of categories as word-weight vectors. This project effectively aims to bridge the gap between these two approaches. Given an ontology and instantiations of ontologic entities and relations realized as Web pages, the system should learn to perform such instantiations automatically. | |
1995-present | Learning Architecture for Search Engine Retrieval |
with Justin Boyan and Thorsten Joachims | |
LASER is a Web search engine which attempts to improve retrieval performance by noting which links a user selects after entering a query. Instead of viewing a HTML page as a flat collection of terms, LASER pays attention to the context in which a term occurs (e.g., in a title field). It associates a coefficient with each context it recognizes, as well as a number of other factors affecting the retrieval status value of pages given a query. Learning is then an optimization problem in this space of coefficients. | |
1994-1996 | WebWatcher |
with Tom Mitchell and Thorsten Joachims | |
WebWatcher attempts to serve as a tour guide to Web neighborhoods. Users invoke WebWatcher by following a hyperlink to the WebWatcher server, then continue browsing as WebWatcher accompanies them, providing advice along the way. WebWatcher gains expertise by analyzing user actions, statements of interest, and the set of pages visited by users. Our studies suggested that WebWatcher could achieve close to the human level of performance on the rather difficult problem of predicting which link a user will follow given a page and a statement of interest. | |
Summer 1994 | Newton Agent Architecture |
with Siegfried Boconek, Siemens | |
Designed and implemented an architecture for communicating software agents on the Apple Newton, as part of the Software Secretary initiative. In this context, agents were "learning apprentice" applications, personal software, such as a calendar manager, that learned through interaction with the user. At issue was how these agents should communicate and what sorts of knowledge they might usefully exchange. | |
1992-1994 | Calendar APprentice Project |
with Tom Mitchell, Rich Caruana, David Zabowski, and others | |
CAP was conceived as a learning apprentice system, a software application that unobtrusively learns to improve performance through user interaction. I was part of efforts to make CAP more aware of its network environment and to add to its set of prediction tasks. As part of the latter initiative, Rich Caruana and I developed Greedy Attribute Selection, a method for selecting learning features in spaces with many redundant and irrelevant features. GAS included optimizations for decision-tree learners that yielded exponential speedup. | |
1990 | Scheme Toolkit for Modeling Systolic Arrays |
supervised by Sanjay Rajopadhye, then at the University of Oregon | |
I developed Scheme code for the display and manipulation of 3-dimensional graphical models of systolic arrays. | |
1991 | OREGAMI |
supervised by Virginia Lo, University of Oregon | |
The OREGAMI group studied the possibility of automatically modeling known algorithms to message-passing parallel architectures. I developed a compiler for a language called LaRCS, which was designed to describe regularities of algorithms exploitable by parallel architectures. |
1992-1995 | NSF Graduate Student Fellowship |
---|---|
1990-1991 | Dean's List, University of Oregon |
1986 | Elected to Phi Beta Kappa |
1981-1986 | Commendations for Excellence in Scholarship |
1992-1998 | Graduate student in CS, Carnegie Mellon. Ph.D., November, 1998. |
---|---|
1990-1991 | Undergraduate in CS, University of Oregon |
1984-1985 | Enrolled in Lewis & Clark College's Year in Munich exchange program |
1981-1986 | Undergraduate at Reed College. B.A. in English Literature, May, 1986 |