Abstract:
Federated Search, also known as distributed information retrieval provides a search solution for information that cannot be accessed by conventional search engines such as Google or AltaVista by linking the search engines with the resources that contain this type of information. Most prior research of federated search focused on selecting search engines that have the most relevant content, but ignored the retrieval effectiveness of individual search engines. Ineffective search engines exist in real world applications like the Boolean retrieval engine on the PubMed1 Web site. Some other examples are the ineffective search engines linked by FedStats portal. Thus, not considering search engine effectiveness can cause serious problems when federating search engines of different qualities.
This talk presents a federated search technique that uses utility maximization to model the retrieval effectiveness of each search engine in a federated search environment. The new algorithm ranks the available resources by explicitly estimating the amount of relevant material that each resource can return, instead of the amount of relevant material that each resource contains. An extensive set of experiments demonstrates the effectiveness of the new algorithm.
This is joint work with Jamie Callan.