Yubin Kim
PhD Candidate | www.cs.cmu.edu/~yubink | yubink@cmu.edu |
Statement of Interest
Experienced researcher with a strong software engineering background seeking applied scientist roles. Thesis work in distributed and federated search. Broadly interested in searching, organizing, and analyzing large data collections. Experienced with solving problems in search efficiency, evaluation, and ranking using search, crowdsourcing, and machine learning tools. I am excited about delivering cutting-edge research to production systems to impact real users and contributing the resulting knowledge back to the research community.
Education
- (Expected May 2018) Ph.D. in Language and Information Technologies
Language Technologies Institute, School of Computer Science
Carnegie Mellon University, Pittsburgh, Pennsylvania - (Apr 2011)Bachelor of Software Engineering
University of Waterloo, Waterloo, Ontario - Graduated with distinction on the Dean's Honours List
Research Experience
- Research Assistant at Carnegie Mellon University
(09/2011 - current)
- Advised by Prof. Jamie Callan in the Language Technologies Institute
- Improved efficiency, accuracy, and stability of selective search, a distributed search architecture that reduces computational cost of web-scale search by 90+%
- Experiments conducted using data sets of 25–500 million web pages on multi-machine computing clusters
- Developed a simulator to study distributed search resource usage over many configurations and develop new load balancing policies
- Produced better-than-additive efficiency gains by combining dynamic posting list optimization methods with selective search
- Created a learning-to-rank based resource selection algorithm for accurately matching queries to topic-based index shards
- Improved microblog ad-hoc search by introducing multiple representations, experiments conducted using Twitter
- Research Assistant at Microsoft Research, Redmond
(05/2013 - 08/2013)
- Mentored by Jaime Teevan and Kevyn Collins-Thompson in Contextual Learning and User Experience in Search (CLUES) group
- Integrated crowdsourcing components into information retrieval tasks
- Mined entity attributes to improve search effectiveness and user experience using crowdsourcing components
- Entered crowdsourced robust query expansion system into the Web Track of TREC 2013
- Research Assistant at University of
Waterloo (09/2010 - 12/2010)
- Full-time position under Prof. Ihab Ilyas in the Database Systems group
- Implemented a database system that natively handles unstructured text
- Research Assistant at University of
Waterloo (05/2010 - 09/2010)
- Part-time position under Prof. Charles Clarke in the Information Retrieval group
- Designed and implemented a system to detect and summarize events in online news media
- Research Intern at Primal Fusion Inc. (01/2010 - 04/2010)
- Designed and implemented a prototype of the next-generation semantic engine that serves as the back-end for all of Primal Fusion's products
- Presented work to company's top officers
- Research Assistant at University of
Waterloo (09/2009 - 12/2009)
- Part-time position under Prof. Ihab Ilyas in the Database Systems research group
- Contributed in implementation of duplicate data detection system
Professional Experience
- Software Developer Intern at A9.com, Inc. (05/2009 - 08/2009)
- Developed a tool that displays the contents of a search index for debugging and QA purposes in C++
- Revamped the index metadata files to use XML formatting, using Python and C++
- Researched Solr and prepared a presentation comparing it to A9.com's search
- Software Developer Intern at Google, Inc.
(09/2008 - 12/2008)
- Developed a system that allows users and radio stations to interact by SMS via mobile phones in Java, including web analytic dashboards
- Software Developer at Sybase iAnywhere, Inc.
(01/2008 - 04/2008)
- Built a multi-threaded database extraction tool from scratch utilizing J2ME, C++ and HTTP
- Fixed several bugs in UltraLiteJ, a light-weight DB for the Blackberry, and wrote test cases for each fix
- Software Developer at Encom Information Systems,
Inc. (07/2007 - 08/2007)
- Communicated directly with client to debug and develop a staff scheduling system written in Progress 4GL
- Rebuilt the defunct clinic scheduling system and enabled it to go live at the beta site
Publications
- Yubin Kim. 2019. Robust Selective Search.
- Yubin Kim and Jamie Callan. 2018. Measuring the effectiveness of selective search index partitions without supervision. In Proceedings of the 4th ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2018). 91–98.
- Zhuyun Dai, Yubin Kim, Jamie Callan. 2017. Learning to Rank Resources. In Proceedings of the 40th Annual ACM SIGIR Conference. 837–840.
- Yubin Kim, Jamie Callan, Shane Culpepper, Alistair Moffat. Efficient Distributed Selective Search. Information Retrieval Journal 20, 3 (2017), 221–252.
- Yubin Kim, Jamie Callan, Shane Culpepper, Alistair Moffat. 2016. Load-Balancing in Distributed Selective Search. In Proceedings of the 39th Annual ACM SIGIR Conference. 905–908.
- Yubin Kim, Jamie Callan, Shane Culpepper, Alistair Moffat. 2016. Does Selective Search Benefit from WAND Optimization?. In Proceedings of the 38th European Conference on Information Retrieval. 145–158.
- Yubin Kim, Kevyn Collins-Thompson, Jaime Teevan. Using the Crowd to Improve Search Result Ranking and the Search Experience. ACM Transactions on Intelligent Systems and Technology special issue on the Crowd in Intelligent Systems 7, 4 (2016), 50:1–50:22.
- Zhuyun Dai, Yubin Kim, Jamie Callan. 2015. How Random Decisions Affect Selective Distributed Search. In Proceedings of the 38th Annual ACM SIGIR Conference. 771–774.
- Yubin Kim, Keyvn Collins-Thompson, Jaime Teevan. 2013. Crowdsourcing for Robustness in Web Search. In Proceedings of the Twenty Second Text REtrieval Conference. National Institute of Standards and Technology, special publication.
- Jaime Teevan, Kevyn Collins-Thompson, Ryen W. White, Susan T. Dumais, Yubin Kim. 2013. Slow Search: Information Retrieval without Time Constraints. In Proceedings of the 7th annual Symposium on Human-Computer Interaction and Information Retrieval. Article 1.
- Yubin Kim, Reyyan Yeniterzi, Jamie Callan. 2012. Overcoming Vocabulary Limitations in Twitter Microblogs. In Proceedings of the Twenty First Text REtrieval Conference. National Institute of Standards and Technology, special publication.
- George Beskales, Mohamed A. Soliman, Ihab F. Ilyas, Shai Ben-David, Yubin Kim. 2010. ProbClean: A probabilistic duplicate detection system. In IEEE 26th International Conference on Data Engineering. 1193–1196.
Technical Proficiency
- Fluent in Java, C++, Ruby, Python, familiar with MATLAB
- Work and research programming experience in Linux, Windows environments
- Experience in modifying search index and database internals in commercial and research settings
Awards and Honours
- NSERC Postgraduate Scholarship Doctoral - $21,000 x 3 years (2013)
- Peter Jackson Fellowship - $17,625 (2012)
- Microsoft Research Graduate Women's Scholarship - $17,000 (2011)
- NSERC Postgraduate Scholarship Masters - $17,300 (2011)
- Graduated with distinction on the Dean's Honours List (2011)
- NSERC Undergraduate Student Research Award - $4,500 (2010)
- Faculty of Engineering Upper-Year Scholarship - $400 (2010)
- Software Engineering Entrance Scholarship - $4,000 (2006)
- President's Scholarship - $2,000 (2006)
- Queen Elizabeth II Aiming for the Top scholarship - $3,500 x 4 years (2006 - 2011)
- Dean's Honour List (2006 - 2011)
- Governor General's Academic Medal for first in graduating class (2006)
Teaching Experience
- Teaching Assistant for Search Engines (11-642)(Fall 2014)
- Graduate level course in information retrieval, 107 students
- Helped developed automated homework testing and feedback service
- Prepared and marked assignments, held office hours, gave guest lectures
- Teaching Assistant for Search Engines and Web Mining (11-641)(Fall 2013)
- Graduate level course in information retrieval, 96 students
- Developed gold-standard implementations for new homework assignments, held office hours, gave guest lectures
- Teaching Assistant for Text Analytics (95-865)(Fall 2011)
- Graduate level mini course in large-scale text analysis, 27 students
- Guest lectured during instructor's absence
- Prepared and marked assignments, held office hours, gave guest lectures
Professional Service
- Program Committee
- ACM Conference on Information and Knowledge Management (CIKM)
- 2015, 2017
- ACM Int'l Conference on Research and Development in
Information Retrieval (SIGIR)
- 2015, 2016, 2017
- ACM Int'l Conference on Web Search and Data Mining (WSDM)
- 2018
- Conference of the European Chapter of the Association for
Computational Linguistics (EACL)
- 2017
- ACM Conference on Information and Knowledge Management (CIKM)
- Reviewer
- ACM Int'l Conference on Research and Development in
Information Retrieval (SIGIR)
- 2014
- ACM Int'l Conference on Web Search and Data Mining (WSDM)
- 2012, 2013, 2014, 2017
- Int'l Symposium on String Processing and Information
Retrieval (SPIRE)
- 2012
- ACM Int'l Conference on Research and Development in
Information Retrieval (SIGIR)