Selective Search
My dissertation topic was selective search. This project aims to alleviate the computational burden of small research institutions and start-up companies that need to perform search operations on web-scale indexes. By creating shards based on document similarity and performing judicious resource selection, computational costs of search can be reduced greatly by searching only the top k shards rather than the entire collection.
Related papers:
- Yubin Kim. 2019. Robust Selective Search.
- Yubin Kim and Jamie Callan. 2018. Measuring the effectiveness of selective search index partitions without supervision. In Proceedings of the 4th ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2018). 91–98.
- Zhuyun Dai, Yubin Kim, Jamie Callan. 2017. Learning to Rank Resources. In Proceedings of the 40th Annual ACM SIGIR Conference (SIGIR 2017). 837–840.
- Yubin Kim, Jamie Callan, Shane Culpepper, Alistair Moffat. Efficient Distributed Selective Search. Information Retrieval Journal 20, 3 (2017), 221–252.
- Yubin Kim, Jamie Callan, Shane Culpepper, Alistair Moffat. 2016. Load-Balancing in Distributed Selective Search. In Proceedings of the 39th Annual ACM SIGIR Conference (SIGIR 2016). 905–908.
- Yubin Kim, Jamie Callan, Shane Culpepper, Alistair Moffat. 2016. Does Selective Search Benefit from WAND Optimization?. In Proceedings of the 38th European Conference on Information Retrieval (ECIR 2016). 145–158. Talk Slides.
- Zhuyun Dai, Yubin Kim, Jamie Callan. 2015. How Random Decisions Affect Selective Distributed Search. In Proceedings of the 38th Annual ACM SIGIR Conference (SIGIR 2015). 771–774
Twitter Search
In a joint effort with Reyyan Yeniterzi, we participated in the ad-hoc search task of the Microblog Track of TREC 2012. The focus of our efforts was directed to addressing the vocabulary mismatch problem between the query and tweets. Two proposed solutions include query expansion through pseudo-relevance feedback and document expansion using URLs present in tweets. The resulting system was competitive and our best run placed in the top 10 of automatic runs.
Related paper:
- Yubin Kim, Reyyan Yeniterzi, Jamie Callan. 2012. Overcoming Vocabulary Limitations in Twitter Microblogs. In Proceedings of the Twenty First Text REtrieval Conference (TREC 2012). National Institute of Standards and Technology, special publication.
Slow Search
During my internship at Microsoft Research in the summer of 2013, I worked on slow search with Jaime Teevan and Kevyn Collins-Thompson. The project explored the benefits of taking more time with search retrieval. Specifically, we worked on integrating crowdsourcing into the search pipeline to improve search accuracy and to enable better search result summarization for entity queries.
Related papers:
- Yubin Kim, Kevyn Collins-Thompson, Jaime Teevan. Using the Crowd to Improve Search Result Ranking and the Search Experience. ACM Transactions on Intelligent Systems and Technology (TIST) special issue on the Crowd in Intelligent Systems 7, 4 (2016), 50:1–50:22.
- Yubin Kim, Keyvn Collins-Thompson, Jaime Teevan. 2013. Crowdsourcing for Robustness in Web Search. In Proceedings of the Twenty Second Text REtrieval Conference (TREC 2013). National Institute of Standards and Technology, special publication.
- Jaime Teevan, Kevyn Collins-Thompson, Ryen W. White, Susan T. Dumais, Yubin Kim. 2013. Slow Search: Information Retrieval without Time Constraints. In Proceedings of the 7th annual Symposium on Human-Computer Interaction and Information Retrieval (HCIR 2013). Article 1.
Other
Previously, I have done research work in event detection, databases, and search engine internals. For more information, please see my curriculum vitae.