LiveMarks
Collaborative Information Gathering

Thomas Kreifelts, Keiichi Nakata, Volker Paulsen, Angi Voss
FIT-CSCW
GMD - German National Research Center for Computer Science
kreifelts@gmd.de

The Coins Project: Collaborative Information Acquisition

Scientists, engineers, and analysts in business can solve problems faster and better as a group, when they collaboratively edit, assess, and structure the information they need. Support for these activities is still insufficient in standard software for group work. The Coins project develops prototypes that address this need by combining groupware with novel techniques like software agents and text mining.

The key idea is mutuality: letting everybody profit a great deal from the accumulated results of other people's work with minimal additional effort for each person involved. While a user submits queries to a search engine, browses and assesses the results, software agents in the background: can apply their accumulated knowledge about the users and look for recommendations from other people and find related documents by inspecting large collections of documents using text mining techniques.

Presently, three prototypes are under development in the Coins project:

TopicMarks captures and analyzes the activities and interests of individual group members as they acquire information from the World-Wide Web. Individual interests are condensed into topics and interest profiles for the group. TopicMarks collects information that is relevant for these topics and propagates it to interested users and groups.
ConceptIndex helps to manage the information collected by a group. It inspects the documents, recognizes keywords and key passages, and weaves them into a network of cross references between the documents. Group members can use ConceptIndex to quickly access that part of the information which is interesting from their particular perspective.
LiveMarks helps to improve the retrieval of documents via search engines by collecting ratings and assessments. It recommends relevant Web pages that obtained high ratings in the past and enriches them with other people's comments.

In terms of implementation, LiveMarks is the most advanced Coins prototype. Its stage is BSCW, a shared workspace system on the Web, its actors are software agents, and its backstage is SOaP, an agent runtime environment.

On Stage: BSCW

LiveMarks uses BSCW, a shared workspace system on the Web (http://bscw.gmd.de) [Bentley et al. 1997], as its front-end. BSCW supports cross-platform cooperative work in widely dispersed working groups by the provision of "shared workspaces", i.e. repositories in which users can upload arbitrary electronic documents, collect URLs, hold threaded discussions, and are kept aware of the activities of others to coordinate their own work. BSCW is integrated with an unmodified Web server and is accessible from standard Web browsers.

Figure 1. The BSCW interface of LiveMarks showing two active query objects, rated and annotated URLs, and a document produced by the group.

In order to use BSCW for agent-based information collection we have extended BSCW in two ways: at the user interface we have introduced a new type of object, the query, and a rating and annotation feature for URLs; at the back-end we have enabled BSCW to communicate with LiveMarks agents.

Whenever a user creates a query for Web documents, this query is propagated to the software agents that work in the background. The agents forward the query to search engines, collect the results, and enrich them with their own recommendations. The recommendations are derived from an internal database that stores references and descriptions of Web documents along with user ratings and annotations. The agents produce recommendations by searching this database for highly rated documents whose descriptions match the query. The best-ranked results are transmitted back to BSCW where they are presented within the query to which they belong, i.e. the query operates as a folder for its results.

After having received the results, users may inspect the documents as well as rate and annotate them for the benefit of their fellow users with whom they share the workspace. Rated and annotated URLs are automatically moved out of the query folder one level up to a more prominent position (cf. figure 1), URLs judged irrelevant disappear for good. The ratings and annotations are also propagated to the LiveMarks agents which store them in their document database for future recommendations.

As long as a query is active, new results will continue to flow in when the query folder can take more. The capacity of a query folder is limited in order to ensure a better overview for the users and to avoid flooding them with too many results. The agents make sure that, at any time, the folder will contain only the best results. The flow of results can be stopped by inactivating a query.

For its members, a BSCW shared workspace serves as the context for an information collection task. The workspace contains all queries and all relevant results. Within this context, the agents will minimize redundant information: same or similar URLs are suppressed, material that has been judged irrelevant within a workspace, or has been removed from the workspace will not be produced again as response to a new query.

By integrating the agent-based information retrieval services of LiveMarks into the BSCW groupware system we believe to have created an environment that addresses the needs of information acquisition tasks:

Information seeking extends over time; intermediate queries and results need to be preserved so that the activity may be interrupted and resumed easily.
Information seeking is not a stand-alone activity. Support tools need to be integrated into an electronic working environment.
Information seeking is not a solitary activity in most cases. Queries and search results need to be shared, assessed and structured in a working group.

Additionally, the group setting of LiveMarks motivates serious and responsible rating and annotating which in turn improves the quality of LiveMarks recommendations.

Backstage: SOaP

LiveMarks agents do the interfacing to BSCW, process the user queries, and produce recommendations. They are implemented on our agent platform SOaP (SOcial Agents Platform). SOaP is built on top of a Java virtual machine and constitutes a minimal operating system for multi-agent systems. The agents of a SOaP application are multi-threaded an run concurrently in a single agent engine or may be distributed to several agent engines in a network. Agent communication is realized by an asynchronous message passing mechanism using mailboxes. SOaP is tailored to our requirements, namely openness, distribution, robustness, security.

Meeting design requirements

Since SOaP is built on top of a Java virtual machine, it is interoperable across heterogeneous platforms. SOaP features open agent communication by employing a language and platform independent message format. Agent communication uses high-level communication primitives: message types and conversations, the latter if required by the application. The HTTP protocol is integrated into the platform, SMTP and FTP are to follow. External services like search engines or databases can be wrapped by specific agents and thus be integrated into SOaP applications.

For reasons of performance, ease of application management, and scalability the agents of a SOaP application can be distributed to several agent engines on different hosts. Such a SOaP application consists of a network of interoperating agent engines that features transparent agent communication across engine boundaries. SOaP provides a unique naming and addressing schema for agents, and a global directory service.

By their very nature, agent applications are long-lived applications that must run for months and years. Agent execution must be robust, agents must survive system failures and maintenance shutdowns. For that reason, SOaP agents are persistent; they can store their state on permanent media. After a system failure or shutdown, the agent engine will restart, and all agents resume their operation. Additionally, SOaP provides an alarm service for agents, so that they can repeat failed actions.

In order to guarantee a tamper proof execution environment for the agents, SOaP agents run under the authority of an user or service provider. Agents running under different authorities cannot interfere each other. This is enforced by the agent core engine with a custom Java security manager. By this means direct access of agents with different authorities is prevented. To gain access to system services, agents authenticate with their authority. Also, SOaP agents are stationary to avoid the security risks associated with agent mobility.

SOaP architecture

SOaP is divided into four layers (cf. figure 2). The two lower layers, the core engine and the system agents, implement the basic agent life cycle services and key features like persistency, agent communication, error recovery, name and alarm services, in a single agent engine. The two top layers, service agents and application agents, provide an abstraction from the locality of agents and make available common distributed services like the directory service.

Figure 2.The SOaP layers

The LiveMarks application of SOaP employs four types of agents: the BSCW agent as interface agent, the task agents which are associated to a BSCW workspace and process all queries within this workspace, and the search and recommender agents as service agents which both wrap external information sources: search engines like AltaVista or Infoseek and the recommender database of rated Web documents that is interfaced via JDBC. The configuration is given in figure 3. Since user administration and authentication is done by BSCW, the present LiveMarks application has no user agents.

Figure 3. The LiveMarks application agents

Plotting for Agents

We observe two trends in collaborative information collection: support for implicit and explicit collaboration. Agent-based approaches support collaboration implicitly. Examples are JASPER [Davies & al. 95] , FAB [Balabanovic & Shoham 97], GAB [Wittenburg et al. 95], PHOAKS [Terveen et al. 97], Do-I-Care [Starr et al. 96]. By observing the users, agents try to accumulate knowledge which they can use for recommendations. These approaches have several problems. Notorious is the cold start problem: In return for their ratings, users obtain useful recommendations only after the database has been filled sufficiently. A second problem is trust in recommendations. If users are not aware how their ratings will be used, they may not be able to give adequate ratings or may not even be aware of their responsibility. As a third problem, the agents have to be able to observe the information acquisition process in sufficient detail, for instance they should know enough about the context of a rating or a query.

The second trend is represented by systems like Pointer [Maltz & Ehrlich 96] , Grassroots [Kamiya et al. 96], and ComMentor [Röscheisen et al. 95]. These systems support only explicit information sharing, that is consciously directed towards selected persons. They can ensure a higher quality of recommendations, but miss opportunities for synergies when people are not aware of their common interests or tasks.

Clearly, both approaches nicely complement each other. In LiveMarks, we went one step beyond a mere combination. BSCW is not a dedicated groupware for cooperative information collection, it is a more general system for Web-based group work. For LiveMarks we extended it to better support the information gathering process with its intermediate stages. Thus, in BSCW with LiveMarks information gathering and other cooperative work is seemlessly integrated. Our agents find a unique, rich environment which they can observe to gain insight into user and group preferences and interests and where they can offer extended services. BSCW will provide fertile grounds for deriving group topics and assembling group collections in TopicMarks; and ConceptIndex will provide alternative views and access to information that was originally put into hierarchies of shared folders.

The Coins project is part of our research framework "The Social Web: New Forms of Interaction in Virtual Environments" (http://orgwis.gmd.de/projects/SocialWeb/), which aims to develop computer networks into a unique social medium where people meet for entertainment, business, and work.

References

Balabanovic, M., Shoham, Y. "Fab: Content-based, collaborative recommendation," Comm. ACM 40, 3 (1997), 66-72.

Bentley, R., Appelt, W., Busbach, U., Hinrichs, E., Kerr, D., Sikkel, K., Trevor, J., Woetzel, G. "Basic support for cooperative work on the World Wide Web," Int. J. Human-Computer Studies 46 (1997), 827-846.

Davies, J., Weeks, R., Revett, Davies, J., Weeks, R., Revett, M. "JASPER: Communicating information agents for the WWW," in Proc. 4th Int. World-Wide Web Conf. (Boston MA, Dec. 1995), World-Wide Web Journal Vol. 1, 1, O'Reilly, Sebastopol CA, 1995, pp. 473-482.

Kamiya, K., Röscheisen, M., Winograd, T. "Grassroots: A system providing a uniform framework for communicating, structuring, sharing information, and organizing people," in Proc. 5th Int. World-Wide Web Conference, Paris, 1996.

Maltz, D., Ehrlich, K. "Pointing the way: Active collaborative filtering," in Proc. CHI'95, pp. 202-209.

Nakata, K., Voss, A., Juhnke, M. , Kreifelts, Th., "Concept Index: Capturing emerging community knowledge from documents," to appear in Marti, P., Bagnara, S. (eds.) Designing Collective Memories, Proc. 7th Le Travail Humain Workshop, Paris, Sept. 1998.

Resnick, P., Varian, H. R. "Recommender systems," Comm. ACM 40, 3 (1997), 56-58.

Röscheisen, M., Mogensen, Ch., Winograd, T. "Beyond browsing: Shared comments, SOAPs, trials, and on-line communities," in Proc. 3rd Int. World-Wide Web Conf., Computer Networks and ISDN Systems 27 (1995), 739-749.

Starr, B., Ackerman, M. S., Pazzani, M. "Do-I-Care: A collaborative web agent," in Proc. CHI'96, 1996.

Terveen, L., Hill, W., Amento, B., McDonald, D., Creter, J. "PHOAKS: A system for sharing recommendations," Comm. ACM 40, 3 (1997), 59-65.

Voss, A., Kreifelts, Th. "SOaP: Providing people with useful information," in S. C. Hayne, W. Prinz (eds.) Proc. GROUP'97, Int. ACM SIGGROUP Conf. on Supporting Group Work - The Integration Challenge (Nov. 16 - 19, 1997, Phoenix AZ), ACM, New York NY, 1997, pp. 291-298.

Wittenburg, K., Das, D., Hill, W., Stead, L. "Group asynchronous browsing on the World-Wide Web," in Proc. 4th Int. World-Wide Web Conf. (Boston MA, Dec. 1995), World-Wide Web Journal Vol. 1, 1, O'Reilly, Sebastopol CA, 1995, pp. 51-62.

LiveMarksCollaborative Information Gathering