KDD-2000 Workshop on Text Mining - Call for Papers

August 20, 2000

To be held at KDD-2000, Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA

Invited speakers :

Ronen Feldman , Instinct Software, Israel
David Lewis, AT&T Research, USA

Call for participation ensuring your copy of the workshop notes is open till July 14. Please register by sending e-mail with your contact information to one of the program chairs. We can not guarantee availability of the workshop notes for late registrations. Please notice that you should also register for the main conference. Collection of a fee (no more than $20) for the workshop notes will be handled by ACM and collected on site.

Call for papers on the following topics is open till May 15, 2000.

Below are submission guidelines, attendance information, important dates and organization details.

Workshop Description

The growing importance of electronic media for storing and exchanging text documents has led to a growing interest in tools and approaches for dealing with unstructured or semi-structured information included in the text documents. In addition to well-organized and maintained text databases, one of the important sources of textual information is the World Wide Web which is expected to continue to grow in the number of users and amount of information available.

Methods developed for mining structured and unstructured data sets as well as text learning and natural language processing techniques are essential for analysis of textual data. While many approaches to text processing are based on statistics and thus only weakly dependent on the language the data is written in, those that involve deeper linguistic processing are typically aimed at English texts. Furthermore, an important step towards exploiting information from texts is automated information extraction from large document sets and building more or less domain specific knowledge bases.

This leads to interesting and important questions of scalability of developed approaches and their applicability to a variety of document formats and languages.

Topics of interest

The objective of this workshop is to enable presentation and exchange of ideas on various aspects of Text Mining. Our desire is to facilitate communication among researchers and practitioners from related and complementary research areas, who are working on similar problems but with possibly different focus and problem solving approaches. More precisely, we invite papers from the four areas:

Text Mining (or Text Learning) (TM)
Information Retrieval (IR)
Natural Language Processing (NLP)
Information Extraction (IE).

Particular topics of interest for the workshop include but are not limited to:

text mining & information retrieval
text mining & natural language processing
text mining & web mining
text representation
text categorization
text segmentation
information extraction
scalability of developed approaches
performance evaluation measures
feature selection
multilingual approaches to text mining
influence of domain and domain specific text mining
innovative applications of text mining.

Invited Talks

Invited talk by Ronen Feldman: "Text Mining: Opportunities and Challenges"

The information age has made it easy to receive and store large amounts of data. The proliferation of documents available - on the Web, in corporate intranets, on news wires and elsewhere - is overwhelming. However, while the amount of data available to us is forever increasing, our ability to absorb and process this information has remained constant. Search engines only exacerbate the problem by making more and more documents available in a matter of a few keystrokes; so-called "push" technology makes the problem even worse by constantly reminding us that we are failing to follow critical news, events, and trends. We experience information overload, and miss important patterns even as they unfold before us.

Text Mining is a new and exciting area of research that tackles this problem through techniques borrowed from data mining, machine learning, information retrieval, natural-language understanding, case-based reasoning, statistics, and knowledge management to help people gain rapid insight into large quantities of semi-structured or unstructured text. Typically, it involves preprocessing a document collection (e.g., through text categorization or term extraction), storing , indexing and analyzing the intermediate representations (through distribution analysis, document clustering, trend analysis, and association rule discovery), and presenting the results graphically.

In my talk I will present some of the new challenges facing the text mining community, with particular focus on the representation of documents and the ability to provide better insights into document collections. I will also try to provide a consumer perspective into text mining while reviewing the business opportunities that currently exist in this area.

Invited talk by David Lewis: "ATTICS: A Toolkit for Text Classification and Text Mining"

ATTICS is an extensible text classification system recently implemented in C++ at AT&T Labs and available for research purposes. In spirit it is a hybrid between text retrieval systems such as SMART and machine learning toolkits such as MLC++. The design, data model, and emphasis on online classifier application are unusual for either type of software. While ATTICS is primarily intended for online text classification, its flexible architecture and support for both textual and non-textual data make it applicable to many text mining tasks. I will discuss how ATTICS can be used for text mining, and when it is or isn't a better choice than other tools.

Submission Guidelines

Submissions should be sent by May 15, 2000, in the electronic form as a PDF or PostScript file to Dunja.Mladenic@cs.cmu.edu, Subject: KDD-2000 workshop submission paper.

Each submission should indicate which of the four areas listed above it best fits into (TM, IR, NLP, IE). The length and formatting of the submissions should follow the KDD-2000 recommendations for submissions (max. 20 pages, single column, a line spacing 1.5, no smaller than 12-point font, at least 1 inch margin on each side) or for camera ready (max. 12 pages, two columns, etc. - see the templates).

Submitted papers will be reviewed by referees from the Program Committee. Accepted papers will be published in the working notes provided by ACM. The authors will be notified about the acceptance or rejection of their papers by June 15, 2000. Camera-ready versions of the papers are due July 11, 2000.

Attendance

Attendance is not limited to the paper authors. We strongly encourage interested researchers from related areas to attend the workshop. One of the objectives of the workshop is to promote the interaction among reseachers and the development of the text mining area of research.

Important Dates

Submission Deadline:	May 15, 2000
Acceptance Notification:	June 15, 2000
Workshop registration:	July 14, 2000
Camera-ready Copies:	July 11, 2000
Workshop date:	August 20, 2000

Organization

Program Chairs

Marko Grobelnik
J.Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
Marko.Grobelnik@ijs.si

Dunja Mladenic
J.Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia and Carnegie Mellon University, School of Computer Science, Pittsburgh, USA, 5000 Forbes Ave, Pittsburgh, PA 15213, USA
Dunja.Mladenic@cs.cmu.edu

Natasa Milic-Frayling
Microsoft Research Ltd, St. George House, 1 Guildhall Street Cambridge, CB2 3NH, United Kingdom
natasamf@microsoft.com

Program Committee

Helena Ahonen,University of Helsinki, Helsinki, Finland
Simon Corston-Oliver, Microsoft Research, Redmond, WA
Mark Craven, University of Wisconsin, Madison, Wisconsin
Walter Daelemans, University of Antwerp, Antwerpen, Belgium
Susan Dumais, Microsoft Research, Redmond, WA
David Elworthy, Microsoft Research Ltd, Cambridge, UK
Ronen Feldman, Instinct Software, Israel
Marko Grobelnik, J.Stefan Institute, Ljubljana, Slovenia
Thorsten Joachims, Universitaet Dortmund, Dortmund, Germany
Rosie Jones, Carnegie Mellon University, Pittsburgh, PA
Natasa Milic-Frayling, Microsoft Research Ltd, Cambridge, UK
Dunja Mladenic, J.Stefan Institute, Ljubljana, Slovenia
Jason Rennie, Massachusetts Institute of Technology, MA
Stephen Robertson, Microsoft Research Ltd, Cambridge, UK
Sean Slattery, Carnegie Mellon University, Pittsburgh, PA
Ian Witten, University of Waikato, Hamilton, New Zealand

This Workshop is partially supported by the European FP5 project "Data Mining and Decision Support for Business Competitiveness: A European Virtual Enterprise (Sol-Eu-Net)".