Using Knowledge Resources
To Improve Information Retrieval
Current search engines understand how humans use language, but they do not understand the language itself. They match the words in a query to the words in a document and words that are linked somehow to the document (e.g., 'Click here to get the employee handbook') to find documents that might satisfy the query. Then they use statistical methods and the behavior of other people who searched for similar information to rank these potential matches. Although current technology works well most of the time, it sometimes fails badly because the search engine doesn't really understand the meanings of the documents that it ranks. Recently, companies, research organizations, and volunteer communities have begun to create large knowledge graphs that describe important, essential, or well-known information. Knowledge graphs are similar in spirit to Wikipedia, but they are designed to be used by computers instead of humans. For example, a knowledge graph might contain the entities Cleveland Cavaliers and LeBron James, and these two entities might be connected by an employs relationship. Information can be entered by people with moderate expertise, and by machine learning software, thus it is practical to build large knowledge graphs that cover a wide range of human knowledge. Freebase, which is now owned by Google, is a well-known knowledge graph that contains 2.5 billion 'facts' about 44 million 'topics' and is growing rapidly. Currently knowledge graphs are used for just a few well-defined tasks, for example, to produce the info boxes that Google displays next to some search results. New methods of using knowledge graphs for more varied tasks are of significant scientific and commercial interest. This project develops new methods of using knowledge graphs to improve the accuracy of search engines, especially for vague, ambiguous, or poorly-specified queries. The search engine uses the knowledge graph to identify the probable meanings of query terms, and then uses this knowledge to improve its ability to identify documents that match those meanings. The project is of practical significance for its potential to improve search engine accuracy on queries that are currently difficult. It is of scientific significance for its potential to inject greater understanding of meaning and relationships into search engines. The project is of educational significance because it provides opportunities for graduate student to do class projects and independent studies that lead to participation in the National Institute of Standards and Technology's (NIST's) annual TREC conference, which is a semi-competitive annual event that attracts some of the best research groups from around the world.
Knowledge graphs are less structured than typical relational databases and semantic web resources but more structured than the text stored in full-text search engines. The weak semantics used in these semi-structured information resources is sufficient to support interesting applications, but is also able to accommodate contradictions, inconsistencies, and mistakes, which makes them easier to scale to large amounts of information. The typical use of a semi-structured resource treats it like a structured resource that has somewhat restricted functionality. The application must understand the semantics associated with each type of entity, attribute, and relation that it uses. Although this approach is effective, the need to understand the semantics of entity types and relation types limits the application's ability to automatically incorporate new types of information as the resource evolves and grows. This project develops new methods of using semi-structured information resources that make fewer assumptions about the structure and semantics of a semi-structured knowledge resource, thus enabling them to make full use of the resource as it grows and evolves. The resource is treated as a network of entities and relations that are each described by a 'bag of words' description. Entities and relations are retrieved using extensions of full-text retrieval methods. Evidence such as estimates of authority or related language models can be associated with entity and relation types, and propagated along specific network links to improve entity and relation models. This project applies this general architecture to make several improvements in the accuracy of a full-text search engine, for example, providing an alternative method of answering entity-attribute queries and a more stable and effective method of query expansion.
Jamie Callan, | Principal Investigator |
Chenyan Xiong, | Research Assistant |
Research results are disseminated by research publications, through our Virtual Appendices web page, and as part of the open-source Lemur Project.
J. Chen, C. Xiong, and J. Callan An empirical study of learning to rank for entity search (short paper). In Proceedings of the 39th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 737-740. ACM. 2016.
Z. Dai, C. Xiong, and J. Callan. Query-biased partitioning for selective search. In Proceedings of the 25th ACM Conference on Information and Knowledge Management (CIKM '16). ACM. 2016.
Z. Dai, C. Xiong, and J. Callan. An evaluation of the Kernel Based Neural Ranking Model in NTCIR-13 WWW. In Proceedings of the 13th NTCIR Conference on Evaluation of Information Access Technologies. National Center of Sciences, Tokyo, Japan. 2017.
Z. Dai, C. Xiong, J. Callan, and Z. Liu. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 126-134. 2018.
K. Y. Gao and J. Callan. Scientific table search using keyword queries. arXiv:1707.03423. 2017.
F. Hasibi, F. Nikolaev, C. Xiong, K. Balog, S. E. Bratsberg, A. Kotov and J. Callan. DBpedia-Entity v2: A test collection for entity search (short paper). In Proceedings of the 40th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1265-1268. ACM. 2017.
H. Li, C. Xiong, and J. Callan. Natural language supported relation matching for question answering with knowledge graphs. In Proceedings of the SIGIR 2017 Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis, pp. 43-48. ACM. 2017.
Z. Liu, C. Xiong, M. Sun, and Z. Liu. Entity-duet neural ranking: Understanding the role of knowledge graph semantics in neural information retrieval. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2395-2405. ACL. 2018.
C. Luo, T. Sakai, Y. Liu, Z. Dou, C. Xiong, and J. Xu. Overview of the NTCIR-13 We Want Web Task. In Proceedings of the 13th NTCIR Conference on Evaluation of Information Access Technologies. National Center of Sciences, Tokyo, Japan. 2017.
C. Xiong and J. Callan. EsdRank: Connecting query and documents through external semi-structured data. In Proceedings of the 24th ACM Conference on Information and Knowledge Management (CIKM'15). ACM. 2015.
C. Xiong and J. Callan. Query expansion with Freebase. In Proceedings of the ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2015). ACM. 2015.
C. Xiong, J. Callan, and T.-Y. Liu. Bag-of-entity representation for ranking (short paper). In Proceedings of the ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2016). ACM. 2016.
C. Xiong, J. Callan and T.-Y. Liu. Word-entity duet representations for document ranking. In Proceedings of the 40th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 763-772. ACM. 2017.
C. Xiong, Z. Dai, J. Callan, Z. Liu, and R. Power. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 55-64. ACM. 2017.
C. Xiong, Z. Liu, J. Callan, and E. Hovy. JointSem: Combining query entity linking and entity based document ranking. In Proceedings of the 26th ACM Conference on Information and Knowledge Management (CIKM '17), pp. 2391-2394. ACM. 2017.
C. Xiong, Z. Liu, J. Callan and T.-Y. Liu Towards better text understanding and retrieval through kernel entity salience modeling. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 575-584. ACM. 2018.
C. Xiong, R. Power, and J. Callan. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017. ACM. 2017.
This research is sponsored by National Science Foundation grant IIS-1422676, a Google Faculty Research Award, and a fellowship from the Allen Institute for Artificial Intelligence. Prior research was sponsored by Google through its support of the Worldly Knowledge project. Any opinions, findings, conclusions or recommendations expressed on this Web site are those of the author(s), and do not necessarily reflect those of the sponsors. |