Software Agents: Completing Patterns and Constructing User Interfaces

Jeffrey C. Schlimmer	SCHLIMMER@EECS.WSU.EDU

Leonard A. Hermens	LHERMENS@EECS.WSU.EDU

School of Electrical Engineering & Computer Science,

Washington State University, Pullman, WA 99164-2752, U.S.A.

To support the goal of allowing users to record and retrieve 
information, this paper describes an interactive note-taking system 
for pen-based computers with two distinctive features. First, it 
actively predicts what the user is going to write. Second, it 
automatically constructs a custom, button-box user interface on 
request. The system is an example of a learning-apprentice software-
agent. A machine learning component characterizes the syntax and 
semantics of the user's information. A performance system uses this 
learned information to generate completion strings and construct a 
user interface.

1.	Introduction and Motivation

People like to record information for later consultation. For many, 
the media of choice is paper. It is easy to use, inexpensive, and 
durable. To its disadvantage, paper records do not scale well. As 
the amount of information grows, retrieval becomes inefficient, 
physical storage becomes excessive, and duplication and distribution 
become expensive. Digital media offers better scaling capabilities. 
With indexing and sub-linear algorithms, retrieval is efficient; 
using high density devices, storage space is minimal; and with 
electronic storage and high-speed networks, duplication and 
distribution is fast and inexpensive. It is clear that our computing 
environments are evolving as several vendors are beginning to market 
inexpensive, hand-held, highly portable computers that can convert 
handwriting into text. We view this as the start of a new paradigm 
shift in how traditional digital information will be gathered and 
used. One obvious change is that these computers embrace the paper 
metaphor, eliminating the need for typing. It is in this paradigm 
that our research is inspired, and one of our primary goals is to 
combine the best of both worlds by making digital media as convenient 
as paper.

This document describes an interactive note-taking software system 
for computers with pen-based input devices. Our software has two 
distinctive features: first, it actively predicts what the user is 
going to write and provides a default that the user may select; 
second, the software automatically constructs a graphical interface 
at the user's request. The purpose of these features is to speed up 
information entry and reduce user errors. Viewed in a larger context, 
the interactive note-taking system is a type of self-customizing 
software.

To clarify this notion, consider a pair of dimensions for 
characterizing software. As Figure 1 depicts, one dimension is task 
specificity. Software that addresses a generic task (e.g., a 
spreadsheet) lies between task independent software (e.g., a 
compiler) and task specific software (e.g., a particular company's 
accounting software). Another dimension is the amount of user 
customization required to make the software useful. Task generic 
software lies between the two extremes, requiring modest programming 
in a specialized language. Self-customizing software uses machine 
learning techniques to automatically customize task generic software 
to a specific user. Because the software learns to assist the user 
by watching them complete tasks, the software is also a learning 
apprentice. Similarly, because the user does not explicitly program 
the defaults or the user interface for the note taking system, it is 
a type of software agent. Agents are a new user interface paradigm 
that free the user from having to explicitly command the computer. 
The user can record information directly and in a free-form manner. 
Behind the interface, the software is acting on behalf of the user, 
helping to capture and organize the information.

Next we will introduce the performance component of the note-taking 
software in more detail, then describe the representations and 
algorithms used by the learning methods. We also present empirical 
results, comparing the performance of seven alternate methods on nine 
realistic note-taking domains, and finally, we describe related 
research and identify some of the system's limitations.

2.	Performance Task

The primary function of the note-taking software is to improve the 
user's speed and accuracy as they enter notes about various domains 
of interest. A note is a short sequence of descriptive terms that 
describe a single object of interest. Example 1 shows a note 
describing a particular personal computer (recorded by the first 
author from a Usenet newsgroup during 1992):

(Example 1)4096K PowerBook 170, 1.4MB and 40MB Int. Drives, 2400/
9600 Baud FAX Modem


Example 2 is a note describing a fabric pattern (recorded by the 
first author's wife):

(Example 2)Butterick 3611 Size 10 dress, top

Tables 5 through 11 later in the paper list sample notes drawn from 
seven other domains. The user may enter notes from different domains 
at their convenience and may use whatever syntactic style comes 
naturally.

>From the user's point of view, the software operates in one of two 
modes: a contextual prompting mode, and an interactive graphical 
interface mode. In the first mode, the software continuously predicts 
a likely completion as the user writes out a note. It offers this as 
a default for the user. The location and presentation of this default 
must balance conflicting requirements to be convenient yet 
unobtrusive. For example, the hand should not hide the indicated 
default while the user is writing. Our solution is to have a small, 
colored completion button follow to the left and below where the user 
is writing. In this location, it is visible to either right- or left-
handed people as they write out notes. The user can reposition the 
button to another location if they prefer. The default text is 
displayed to the immediate right of this button in a smaller font. 
The completion button is green; the text is black. The completion 
button saturation ranges from 1 (appearing green), when the software 
is highly confident of the predicted value, to 0 (appearing white), 
when the software lacks confidence. The button has a light gray 
frame, so it is visible even when the software has no prediction. 
Figure 2 portrays a screen snapshot of the software operating in the 
contextual prompting mode for a PowerBook note.

The software's second mode presents an interactive graphical 
interface. Instead of requiring the user to write out the text of a 
note, the software presents a radio-button and check-box interface 
(what we call a button-box interface). With this, the user may select 
>from text fragments, portions of notes called descriptive terms, by 
tapping on radio-buttons or check-boxes with the pen interface 
device. Each selection from the button-box interface is added to the 
current note. Intuitively, check boxes are generated to depict 
optional descriptive terms, whereas radio-button panels are 
generated to depict alternate, exclusive descriptive terms. For user 
convenience, the radio-buttons are clustered into panels and are 
sorted alphabetically in ascending order from top to bottom. To allow 
the user to add new descriptive terms to a button-box panel, an 
additional blank button is included at the bottom of each. When the 
user selects a radio button item, the graphical interface is expanded 
to depict additional choices corresponding to descriptive terms that 
follow syntactically. The software indicates its predictions by 
preselecting the corresponding buttons and highlighting them in 
green. The user may easily override the default selection by tapping 
the desired button. Figure 3 portrays a screen snapshot of the 
software operating in the interactive graphical interface mode for 
a PowerBook note.

The software is in prompting mode when a user begins to write a note. 
If the learned syntax for the domain of the note is sufficiently 
mature (see Section 6, Constructing a Button-Box Interface), then 
the software can switch into the button-box mode. To indicate this 
to the user, a mode switch depicted as a radio button is presented 
for the user's notice. A convenient and unobtrusive location for this 
switch is just below the completion button. In keeping with the color 
theme, the mode switch also has a green hue. If the user taps this 
switch, the written text is removed, and the appropriate radio 
buttons and check boxes are inserted. The system automatically 
selects buttons that match the user-written text. As the user makes 
additional selections, the interface expands to include additional 
buttons. When the user finishes a note, in either mode, the software 
returns to prompting mode in anticipation of another note. Because 
the interface is constructed from a learned syntax, as the software 
refines its representation of the domains of the notes, the button-
box interface also improves. On-line Appendix 1 is a demonstration 
of the system's operation in each of its two modes.

3.	Learning a Syntax

To implement the two modes of the note taking software, the system 
internally learns two structures. To characterize the syntax of 
user's notes, it learns finite-state machines (FSMs). To generate 
predictions, it learns decision tree classifiers situated at states 
within the FSMs. In order to construct a graphical user interface, 
the system converts a FSM into a set of buttons. This section 
describes the representation and method for learning FSMs. The next 
section discusses learning of the embedded classifiers.

3.1	Tokenization

Prior to learning a finite-state machine, the user's note must first 
be converted into a sequence of tokens. Useful tokenizers can be 
domain independent. However, handcrafted domain-specific tokenizers 
lead to more useful representations. The generic tokenizer used for 
the results reported here uses normal punctuation, whitespace, and 
alpha-numeric character boundaries as token delimiters. For example, 
our generic tokenizer splits the sample PowerBook note in Example 1 
into the following 16 tokens:

:NULL

"4096"

" K"

" PowerBook"

" 170"

", 1.4"

"MB"

" and"

" 40"

"MB"

" Int."

" Drives"

", 2400/9600"

" Baud"

" FAX"

" Modem" .

The token :NULL is prepended by the tokenizer. This convention 
simplifies the code for constructing a FSM.

3.2	Learning a Finite-State Machine

Deterministic finite-state machines (FSMs) are one candidate 
approach for describing the syntax of a user's notes because they 
are well understood and relatively expressive. Moreover, Angluin 
(1982) and Berwick and Pilato (1987) present a straightforward 
algorithm for learning a specific subclass of FSMs called k-
reversible FSMs. The algorithm is incremental and does not suffer 
>from presentation order effects. Berwick and Pilato define a k-
reversible FSM as:

"A regular language is k-reversible, where k is a non-negative 
integer, if whenever two prefixes whose last k words [tokens] match 
have a tail in common, then the two prefixes have all tails in common. 
In other words, a deterministic finite-state automaton (DFA) [FSM] 
is k-reversible if it is deterministic with lookahead k when its sets 
of initial and final states are swapped and all of its arcs 
[transitions] are reversed."

Given a list of tokens, the k-reversible FSM algorithm first 
constructs a prefix tree, where all token sequences with common k-
leaders share a k-length path through the FSM. For example, Figure 4a 
depicts a simple FSM constructed for a single fabric pattern note. 
The text of the user's note was converted into a sequence of tokens. 
Then a transition was created for each token and a sequence of states 
was created to link them together. One state serves as the initial 
state, and another indicates the completion of the sequence. For 
convenience, this latter, terminal state is depicted with a double 
circle. If the FSM is able to find a transition for each token in 
the sequence, and it arrives at the terminal state, then the FSM 
accepts the token sequence as an instance of the language it defines. 
Figure 4b depicts the same FSM after another path has been added 
corresponding to a second fabric pattern note (Example 2). Now the 
FSM will accept either note if expressed as a sequence of tokens. 
This FSM is a trivial prefix tree because only the first state is 
shared between the two paths.

 A prefix tree is minimal for observed token sequences, but it may 
not be general enough for use in prediction. (The prefix tree is, in 
essence, an expensive method for memorizing token sequences-which is 
not the desired result.) For the sake of prediction, it is desirable 
to have a FSM that can accept new, previously unseen combinations of 
tokens. The prefix tree automaton can be converted into a more 
general FSM by merging some of its states. A particular method for 
doing this converts a prefix tree into a k-reversible FSM via 
Angluin's (1982) algorithm. The algorithm merges states that have 
similar transitions, and it creates a FSM that accepts all token 
sequences in the prefix tree, as well as other candidate sequences. 
Table 1 lists the three rules for deciding when to merge a pair of 
states in a prefix tree to form a k-reversible FSM. In the special 
case where k equals zero, all states have a common k-leader, and 
Rule 2a ensures that there will be only one accepting state.

Because the rules in Table 1 must be applied to each pair of states 
in the FSM, and because each time a pair of states is merged the 
process must be repeated, the asymptotic complexity of the process 
is O(), where  is the number of states in the FSM.

Applying these rules to the prefix tree in Figure 4b with k equal to 
zero results in a FSM depicted in Figure 5a. Notice that the first 
two states have been merged to make the FSM deterministic (Rule 1). 
The accepting states have also been merged in compliance with 
Rule 2a. The resulting FSM has fewer states but is not more general. 
It only accepts the two token sequences originally seen. Extending 
this example, Figure 5b illustrates the addition of a third fabric 
pattern note as a prefix tree path to the FSM. Reapplying the rules 
results in the FSM shown in Figure 6. The first two states have been 
merged as before through the action of the determinism Rule 1. Note 
that a pair of latter states have also been merged because they share 
a common zero-leader (true of all pairs of states) and because they 
transition to the common terminal state on the token "dress".

Figure 7 depicts a more sophisticated result; it shows a learned 
zero-reversible FSM for notes about PowerBook computers. This 
example shows that the model number "100" is never followed by a 
specification for an internal floppy drive, but that other model 
numbers are. Any model may have an external floppy drive. Note that 
there is a single terminal state. Whitespace and punctuation have 
been eliminated for clarity in the figure.

The rules listed in Table 1 are generalization operators that allow 
the FSM to accept previously unobserved sequences. Whenever two or 
more states are merged into one, the FSM will accept more sequences 
than before if the new state is at the tail end of more transitions 
than one of the previous states and if the new state is at the head 
end of at least one transition. For example, the state just after 
State 1 in Figure 7 was merged from several previous states and 
generalizes memory sizes for PowerBook models. These rules comprise 
a heuristic bias and may be too conservative. For example, Figure 8 
depicts a FSM for notes about fabric patterns. Many of the states 
prior to the accepting state could be usefully merged, but using only 
the rules listed in Table 1, many more notes will have to be 
processed before this happens. If the FSM in Figure 8 were rendered 
as a button-box interface, it would reflect little of the true 
structure of the domain of fabric patterns. Table 2 lists 
specializations of Rules 2a and 2b and an additional pair of rules 
we developed to make the FSM generalize more readily. Note that the 
parameter k has been set to zero in Rule 2 and to one in Rule 3. 
Effectively, two states are merged by Rules 3a or 2b' if they share 
an incoming or outgoing transition. Rule 3b is a Kleene rule that 
encourages the FSM to generalize the number of times a token may 
appear in a sequence. If one state has a transition to another, then 
merging them will result in a transition that loops from and to the 
newly merged state. Figure 9 depicts a FSM for notes about fabric 
patterns learned using all three generalization rules in Table 2. 
The resulting FSM accurately captures the syntax of the user's fabric 
pattern notes and correctly indicates the syntactically optional 
tokens that may appear at the end of note. When rendered as a button-
box interface, it clearly depicts the user's syntax (as illustrated 
later by Figure 12). The added generalization rules may have only 
marginal effects on the system's ability to accurately predict a 
completion as the user writes out a note (as Table 14 below 
indicates). Their purpose is to improve the quality of the custom 
interface.

Cohen (1988) uses an interesting alternative representation for 
learning a syntactic form. The goal in his work is to guide the 
generation of proof structures. Intuitively, the representation is 
a finite-state machine that accepts a tree rather than a sequence, 
and for this reason it is termed a tree automaton. Like the rules in 
Tables 1 and 2, tree automatons are generalized by merging states 
that share similar transitions. Oddly enough, one motivation for 
using tree automatons is that they are less likely to introduce 
extraneous loops, the opposite of the problem with the original FSM 
merging rules in Table 1. It is not clear how to map the sequence of 
tokens in the user's notes into a tree structure, but the less 
sequential nature of the tree automaton may help alleviate sequencing 
problems in rendering the custom user interface (see Section 9, 
Observations/Limitations).

3.3	Parsing

To use the finite-state machine for prediction, the software needs 
a strategy for dealing with novel tokens. For example, when the user 
takes a note about a PowerBook computer with a new memory 
configuration, the FSM will not have a transition for the first 
token. If the software is to prompt the user, then it must have a 
means for deciding where novel tokens lie in a note's syntax-which 
state to predict from. Without such a mechanism, no meaningful 
prediction can be generated after novel tokens.

A state may not have a transition for the next token. In general, 
this is a single symptom with three possible causes: (1) a novel 
token has been inserted, (2) a suitable token has been omitted and 
the next token would be accepted by a subsequent state, or (3) a 
token has been simply replaced by another in the syntax. For example, 
in the sequence of tokens {:NULL, "12288", "K", "PB"}, "12288" is a 
novel token, a familiar memory size has been omitted, and "PowerBook" 
has been replaced by "PB".

An optimal solution would identify the state requiring a minimum 
number of insertions, omissions, and replacements necessary to parse 
the new sequence. An efficient, heuristic approximation does a greedy 
search using a special marker. Each time the marked state in the FSM 
has a transition for the next token written by the user, the marker 
is moved forward, and a prediction is generated from that state. When 
there is no transition for the next token, a greedy search is 
conducted for some state (including the marked one and those 
reachable from it) that has a transition for some token (including 
the next one and those following). If such a state is found, the 
marker is moved forward to that state, tokens for the transitions of 
skipped states are assumed omitted, and novel tokens are assumed 
inserted. If no state past the marker has a transition for any of 
the remaining tokens, the remaining tokens are assumed to be 
replacements for the same number of the most likely transitions; the 
marker is not moved. If the user writes a subsequent token for which 
some state has a transition, the marker is moved as described above, 
and the syntax of the user's note is realigned with the learned 
syntax. Continuing with the simple PowerBook example, the marker is 
moved to State 1 of the FSM in Figure 7 because the initial state 
had a transition for the first token :NULL. Because State 1 doesn't 
have a transition for the next token "12288", a greedy search is 
conducted to find a nearby state that accepts either "12288", "K", 
or "PB". The state just before State 2 accepts "K", so the marker is 
moved to that state. Another greedy search is started to find a state 
that accepts "PB". Because one cannot be found, the heuristic parsing 
assumes that it should skip to the next transition. In this case the 
one labeled "PowerBook". Consequently, the system generates a 
prediction from State 2 to prompt the user.

3.4	Multiple Finite-State Machines

If the user decides to take notes about multiple domains, it may be 
necessary to learn a separate syntax for each domain. For example, 
a single syntax generalized over both the PowerBook and fabric 
pattern notes is likely to yield confusing predictions and an 
unnatural user interface. Maintenance of multiple finite-state 
machines is an instance of the clustering problem-deciding which 
notes should be clustered together to share a FSM. As Fisher (1987) 
discusses, this involves a trade-off between maximizing similarity 
within a cluster and minimizing similarity between clusters. Without 
the first criteria, all notes would be put into a single cluster. 
Without the second criteria, each note would be put into its own 
cluster.

One obvious approach would be to require the user to prepend each 
note with a unique token to identify each note's domain. This 
simplifies the clustering computation. All notes sharing the first 
token would share a FSM. However, with this scheme, the user would 
have to remember the identifying token or name for each domain. An 
interface could provide a pop-up list of all previously used domain 
identifiers. This is not satisfactory because it requires overhead 
not needed when taking notes on paper.

An alternative approach doesn't require any extra effort on the part 
of the user. A new note is grouped with the FSM that skips the fewest 
of its tokens. This heuristic encourages within cluster similarity 
because a FSM will accept new token sequences similar to those it 
summarizes. To inhibit the formation of single-note FSMs, a new FSM 
is constructed only if all other FSMs skip more than half of the new 
note's tokens. This is a parametrized solution to encourage between-
cluster dissimilarity.

4.	Learning Embedded Classifiers

Finite-state machines are useful representations for capturing the 
syntax of a user's notes, and they are easy to learn. When predicting 
a note's completion, it is essential that a prediction be made from 
the correct state in the FSM (as discussed above). It is also 
necessary to decide whether to terminate (indicating acceptance of 
the note) or continue prediction, and, in the later case, which 
transition to predict. To facilitate these decisions, the FSM can 
maintain a count of how many times parsing terminated and how many 
times each transition was taken. Prediction can then return the 
option with the maximum frequency.

Figure 10 depicts a FSM for which this method will prove 
insufficient. There is only one state, an accepting state, and the 
transition corresponding to the token "X" is optional. (This 
corresponds to a check box interface item.) There are two problems 
with a frequency-based prediction. First, the FSM does not indicate 
that the transition is to be taken at most once, yet this is quite 
clear from the user interface. Second, simple frequency-based 
prediction would always recommend termination and never the 
transition. The FSM accepts whether the box is checked or not, thus 
the frequency of termination is greater than or equal to the 
frequency of the transition. This problem arises whenever there is 
a loop.

Embedding general classifiers in a FSM can alleviate some of the 
FSM's representational shortcomings. For example, in the FSM 
depicted in Figure 10, a decision tree embedded in this state easily 
tests whether the transition has already been taken and can advise 
against repeating it. Moreover, a classifier can predict based on 
previous transitions rather than just the frequency of the current 
state's transitions. Therefore, a decision tree embedded in the state 
of Figure 10 can predict when the transition should be taken as a 
function of other, earlier tokens in the sequence. Table 3 lists 
sample decision trees embedded in states of the FSM depicted in 
Figure 7. The first tree tests which token was parsed by a distant 
state, in effect augmenting the FSM representation. It relates memory 
size to hard disk capacity (small amounts of memory correlate with 
a small hard disk). The second tree prevents an optional loop from 
being taken a second time by testing to see if the state has yet been 
visited during a parse of the note. After processing additional 
notes, this second decision tree becomes more complex as the system 
tries to predict which PowerBooks have FAX modems and which do not.

A classifier is trained for each state in the FSM which: (a) has more 
than one transition, or (b) is marked as a terminal state but also 
has a transition. The classifiers are updated incrementally after 
the user finishes each note. The classifier's training data are token 
sequences parsed at this state. The class value of the data is the 
transition taken from, or termination at, this state by the token 
sequences. Only those classifiers whose states are used in a parse 
are updated. The attributes of the data are the names of states prior 
to this one, and the values of the attributes are the transitions 
taken from those states. A distinct attribute is defined each time 
a state is visited during a given parse, so when a loop transition 
is taken a specific attribute reflects this fact. For any of the 
attributes, if the corresponding state was not visited while parsing 
the token sequence, the attribute has a special, empty value.

Consider the PowerBook FSM shown in Figure 7. A classifier would be 
embedded at States 1, 2, 3, 4, 5, 6, 7. A training example 
corresponding to the note in Example 1 for the classifier at State 6 
would be:

Attributes:				Values:

S1			=	"4096"

S2			=	" 170"

S3			=	NIL

S4			=	" 40"

S5			=	" Drives"

S6			=	", 2400/9600"

S7			=	" FAX"

S7-1			=	" Modem"

Class:			=	:TERMINATE .

Note that there is no value for State 3, denoting that it wasn't 
visited during the parse of Example 1. Also there are two attributes 
for State 7 denoting that it has been visited twice. 

The classifier gives informed advice about which transition to take 
or whether to terminate. The FSM in turn gives the classifier a 
specific context for operation. If only a single classifier were used 
to predict the next token, it would be hard pressed to represent the 
different predictions required. The domain is naturally narrowed by 
the FSM and therefore reduces the representational demands on the 
classifier. Later, we present empirical results comparing a single 
classifier to a set of classifiers embedded in a FSM. The findings 
there show that the latter outperforms the former, confirming the 
intuition that learning is more effective if situated within a narrow 
context.

>From the classifier's point of view, the learning task is non-
stationary. The concept to be learned is changing over time because 
the structure of the FSM is changing. When two states are merged, 
one of the two classifiers is discarded. The other is now embedded 
in a different position in the FSM, and it sees different training 
data. Similarly, when other states are merged, the attributes of the 
training data also change. To help mitigate this effect, the new 
state takes the oldest identifier assigned to the two merged states. 
Empirical results in Table 14 illustrate that the FSM does not have 
to be fixed before the classifier can learn useful information.

5.	Contextual Prompting

In the prompting mode, the software continuously predicts a likely 
completion as the user writes out a note. It presents this as a 
default next to the completion button. The button's saturation ranges 
>from white to green in proportion to the confidence of the 
prediction. If the user taps the completion button, the prompt text 
is inserted at the end of the current note.

A completion is generated by parsing the tokens already written by 
the user, finding the last state visited in the FSM, and predicting 
the next most likely transition (or termination). This process is 
repeated until a stopping criterion is satisfied, which is discussed 
below. If the last token written by the user is incomplete, matching 
only a prefix of a state's transition, then the remainder of that 
transition is predicted. If the last token matches more than one 
transition, a generalized string is predicted using special 
characters to indicate the type and number of characters expected. 
If a digit is expected, a "#" is included; if a letter, an "a" is 
included; if either are possible, a "?" is included; and if some 
transition's tokens are longer than others, a "..." is appended to the 
end. For example, if the user has written "4096K PowerBook 1", the 
possible values for PowerBook models of "100", "140", "160C", and 
"170" are generalized, and the prompt is "#0...".

A simple calculation is used to compute the confidence of the 
prediction and set the button's color saturation. It is the simple 
ratio

where  is the frequency of the predicted arc (or terminate) [i.e., 
the number of times this choice was taken while parsing previously 
observed notes],  is the total frequency of all arcs (and terminate), 
and  is the number of tokens skipped during heuristic parsing (cf. 
Section 3.3, Parsing). Confidence is directly proportional to the 
simple likelihood of the prediction and is degraded in proportion to 
the number of tokens the FSM had to skip to get to this point. This 
information is used in a simple way, so it is unclear if more 
sophisticated measures are needed.

The stopping criterion is used to determine how much of a prompt to 
offer the user. At one extreme, only a single token can be predicted. 
This gives the user little context and may not provide much 
assistance. At the other extreme, a sequence of tokens that completes 
the note can be predicted. This may be too lengthy, and the user 
would have to edit the prompt if selected. The stopping criterion in 
Table 4 balances these two extremes and attempts to limit prompts to 
a consistent set of tokens. In particular, Condition 3 stops 
expanding the prompt upon reaching a syntactic boundary (leading 
punctuation) or upon reaching a semantic boundary (falling 
confidence).

6.	Constructing a Button-Box Interface

In the button-box mode, the software presents an interactive 
graphical interface. Instead of writing out the note, the user may 
select note fragments by tapping buttons. To switch from contextual 
mode to button-box mode, a green radio button indicator is displayed 
below the completion button when the software is confident about the 
user's syntax. If the user taps this indicator, the existing text is 
removed, and the corresponding buttons in the button-box interface 
are selected. As the user selects additional buttons, the interface 
dynamically expands to reveal additional choices. Because the 
interface reflects an improving syntactic representation, it also 
improves with successive notes.

The button-box interface is a direct presentation of a finite-state 
machine. After the user has written out a token or so of the note, 
the software finds the FSM that best parses these tokens. The mode 
switch is presented if the syntax is sufficiently mature-if the 
average number of times each state has been used to parse earlier 
notes is greater than 2. If the user selects this indicator, the FSM 
is incrementally rendered as a set of radio buttons and check boxes.

The two user interface item types correspond to optional choices 
(check boxes) and exclusive choices (radio buttons). Mapping a FSM 
into these two item types proceeds one state at a time. Given a 
particular state to be rendered, any transition that starts a path 
that does not branch and eventually returns back to the state is 
rendered as a check box (a loop). The loop corresponds to 
syntactically optional information. The label for the check box 
consists of each of the transition labels along the looping path. 
Other non-looping transitions are rendered as buttons in a single 
radio button panel along with an extra, unlabeled button. They 
correspond to syntactically exclusive information. The label for 
each radio button consists of each transition label up to the point 
of a subsequent branch or termination. For example, compare the FSM 
depicted in Figure 7 and the corresponding button-box interface in 
Figure 3.

Because the transitions for different radio buttons lead to different 
parts of the FSM, it may confuse the user to render the entire FSM 
at once. So, each branching state is rendered as it is visited. 
Initially, the first state in the FSM is rendered. Then, when a radio 
button is selected, the branching state at the end of its transition 
path is rendered. Note that check boxes do not trigger additional 
rendering because the branching state at the end of their loop has 
already been rendered. This interactive process is repeated as long 
as the user selects radio buttons that lead to branching states.

7.	Empirical Results

We tested the interactive note taking software on notes drawn from 
a variety of domains. Tables 5 through 11 list sample notes from 
seven domains (in addition to the PowerBook and fabric pattern sample 
notes listed above).

CVA-62 8/6/63 to 3/4/64 Mediterranean A-5A AG 60X

CVA-61 8/5/64 to 5/6/65 Vietnam RA-5C NG 10X

Table 5: Sample notes from the airwing domain. Listed above are 2 of 
the 78 notes about airwing assignments aboard aircraft carriers 
collected from (Grove & Miller, 1989).

B, 81, 5, 151 (2.5), Cyl. 4, 2-bbl., Pontiac

C, 82, X, 173 (2.8), Cyl. 6, 2-bbl., Chevrolet

Table 6: Sample notes from the engine code domain. Listed above are 
2 of the 20 notes about the meaning of engine codes stamped on 
automobile identification plates collected from Chilton's Repair & 
Tune-Up Guide (1985).

90, Mazda MPV, 40K MI, 7 Pass, V6, Auto

ABS, PL/PW, Cruise, Dual Air

87, Grand Caravan, 35K MI, 7 Pass, V6, Auto

Cruise, Air, Tilt, Tinting

Table 7: Sample notes from the minivan domain. Listed above are 2 of 
the 22 notes about minivan automobiles collected by the first author.

Lorus Disney Oversize Mickey Mouse Watch.

Genuine leather strap.

Seiko Disney Ladies' Minnie Mouse Watch.

Leather strap.

Table 8: Sample notes from the watch domain. Listed above are 2 of 
the 89 notes about personal watches collected from the Best catalog 
(a department store).

azatadine maleate

Blood: thrombocytopenia.

CNS: disturbed coordination, dizziness, drowsiness, sedation, 
vertigo.

CV: palpitations, hypotension.

GI: anorexia, dry mouth and throat, nausea, vomiting.

GU: Urinary retention.

Skin: rash, urticaria.

Other: chills, thickening of bronchial secretions.

brompheniramine maleate

Blood: aganulocytosis, thrombocytopenia.

CNS: dizziness, insomnia, irritability, tremors.

CV: hypotension, palpitations.

GI: anorexia, dry mouth and throat, nausea, vomiting.

GU: urinary retention.

Skin: rash, urticaria.

After parenteral administration:

 local reaction, sweating, syncope may occur.

Table 9: Sample notes from the antihistamine domain. Listed above 
are 2 of the 17 notes on the side effects of antihistamines collected 
>from the Nurses Guide to Drugs (1979).

Canon FD f/1.8, 6oz., f/22, 13in.,

good sharpness, poor freedom from flare,

better freedom from distortion,

focal length marked on sides as well as

on front of lens

Chinon f/1.7, 6oz., f/22, 9in.,

poor sharpness, good freedom from flare,

good freedom from distortion,

cannot be locked in program mode, which

is only a problem, of course, when lens is

used on program-mode cameras

Table 10: Sample notes from the lens domain. Listed above are 2 of 
the 31 notes about 35mm SLR camera normal lenses collected from the 
Consumer Reports (1988).

22in. W. 48in.

A very large falcon. Three color phases occur:

blackish, white, and gray-brown. All

are more uniformly colored than the

Peregrine Falcon, which has dark

mustaches and hood.

16-24in. W. 42in.

Long-winged, long-tailed hawk with a

white rump, usually seen soaring

unsteadily over marshes with its wings

held in a shallow 'V'. Male has a pale

gray back, head, and breast. Female

and young are brown above, streaked

below, young birds with a rusty tone.

Table 11: Sample notes from the raptor domain. Listed above are 2 of 
the 21 notes about North American birds of prey collected from (Bull 
& Farrand, 1977).

Summary characteristics of the nine domains are listed in Table 12 
together with some simple measures to indicate prediction 
difficulty. For instance, Column 1 shows the number of notes in the 
domain. With a larger number of notes, the easier it should be to 
accurately train a predictive method. Column 4 shows the standard 
deviation (STD) of the length of all notes in each domain. It is more 
likely that a well-behaved FSM can be discovered when STD is low. In 
this and successive tables, the domains are ranked by STD. Column 5 
presents the percentage of unique tokens in the notes. The fewer 
novel tokens a note has, the more likely that successive tokens can 
be predicted. This measure places an upper bound on predictive 
accuracy. Column 6 shows the percentage of constant tokens, ones that 
always appear in a fixed position. It is easier to predict these 
constant tokens. Finally, Column 7 indicates the percentage of 
repeated tokens. When fewer tokens are repeated verbatim within a 
note, the more likely that the predictive method will not become 
confused about its locale within a note during prediction.

The first six domains are natural for the interactive note taking 
task because they exhibit a regular syntax. The last three domains 
are included to test the software's ability on less suitable domains. 
Notes from the Antihistamine, Lens, and Raptor domains contain 
highly-variable lists of terms or natural language sentences. 
Learned FSMs for notes in these domains are unlikely to converge, 
and, in the experiments reported here, only the FSM for the Lens data 
exceeded the maturity threshold (average state usage greater than 2).

7.1	Contextual Prediction Accuracy

Column 7 of Table 13 lists the accuracy of next-token predictions 
made by the software in prompting mode. The first nine rows list 
predictive accuracy over all tokens as notes from each of the nine 
domains are independently processed in the order they were collected. 
The last row lists predictive accuracy over all tokens as notes from 
all nine domains are collectively processed. This simulates a user 
taking notes about several domains simultaneously. 

To put these results in context, the table also lists predictive 
accuracies for several other methods. Column 1 lists the accuracy 
for a lower bound method. It assumes that each note shares a fixed 
sequence of tokens. Termed common, this method initializes its 
structure to the first note. It then removes each token in this 
sequential structure that cannot be found in order in other notes. 
At best, this method can only predict the constant, delimiter-like 
tokens that may appear regularly in notes. Its performance is limited 
by the percentage of constant tokens reported in Column 6 of 
Table 12. It performs best for the PowerBook notes where it learns 
the following note syntax:

(Example 3)* :NULL * "K" * " PowerBook" * "MB" * "MB" * " Int." * .

(The asterisks indicate Kleene star notation.) This reads as some 
sequence of zero or more tokens then the token :NULL, followed by 
zero or more tokens then "K", followed by zero or more tokens then 
"PowerBook", and so on. It is less successful for the minivan notes 
where it learns a simpler syntax:

(Example 4)* :NULL * "K" * " MI" * " Pass" * .

Columns 2 and 3 of Table 13 list the accuracy of using a classifier 
to directly predict the next token without explicitly learning a 
syntax. In this paradigm, examples are prefixes of token sequences. 
Attributes are the last token in the sequence, the second to last 
token, the third to last token, and so on. Class values are the next 
token in the sequence-the one to be predicted. Column 2 lists the 
performance of a simple Bayes classifier, and Column 3 lists the 
performance of an incremental variant of ID3 (Schlimmer & Fisher, 
1986). Perhaps surprisingly, these methods perform considerably 
worse than the simple conjunctive method. Without the benefit of a 
narrow context provided by the FSM, these methods must implicitly 
construct representations to detect differences between similar 
situations that arise within a single note. For example, in the 
PowerBook notes, a classifier-only approach must learn to 
discriminate between the first and second occurrence of the "MB" 
token.

Column 4 of Table 13 lists the accuracy of a more viable prediction 
mechanism. Based on simple ideas of memorization and termed digram, 
the method maintains a list of tokens that have immediately followed 
each observed token. For example, in the fabric pattern domain, this 
method retains the list of tokens {"8-10-12", "10", "11/12", "12"} 
as those that follow the token "Size". Each list of follow tokens 
are kept in order from most to least frequent. To predict the next 
token, the system looks for the last token written and predicts the 
most frequent follow token. This method is nearly as effective as 
any other in Table 13, especially on the combined task when notes 
>from each domain are entered in random order. Laird (1992) describes 
an efficient algorithm for maintaining higher-dimensional n-grams, 
in effect increasing the context of each prediction and effectively 
memorizing longer sequences of tokens. Laird's algorithm builds a 
Markov tree and incorporates heuristics that keep the size of the 
tree from growing excessively large. Regrettably, these methods are 
unsuitable for the interactive note-taking software because of the 
difficulty of using them to construct a custom user interface. It is 
plausible to construct a panel of exclusive choices based directly 
on the set of follow tokens, but it is unclear how to identify 
optional choices corresponding to loops in finite-state machines. 
Moreover, if notes are drawn from different domains, and those 
domains share even a single token, then some follow set will include 
tokens from different domains. Using these follow sets to construct 
a user interface will unnecessarily confuse the user by introducing 
options from more than one domain at a time.

Column 5 of Table 13 lists the accuracy of prediction based solely 
on the learned FSMs. Without an embedded classifier, this method must 
rely on prediction of the most common transition (or termination) 
>from each state. Because the prediction is based on simple counts 
(as noted in Section 4, Learning Embedded Classifiers), this method 
never predicts optional transitions.

Columns 6 and 7 of Table 13 list the accuracy of predicting using 
FSMs and embedded classifiers. The classifiers used are simple Bayes 
and the incremental ID3, respectively. The latter outperforms either 
the FSM alone or the FSM with embedded Bayes classifiers. If the 
system only makes predictions when its confidence measure is greater 
than 0.25, the accuracy is significantly different for the Engine 
Code, Minivan, Lens, and Raptor domains, ranging between 10 and 22 
percentage points of improvement.

Column 8 of Table 13 lists an estimate of the upper-bound on 
predictive accuracy. This was calculated by assuming that prediction 
errors were only made the first time each distinct token was written.

7.2	Design Decisions

The note taking software embodies a number of design decisions. 
Table 14 lists the effects of these decisions on predictive accuracy 
by comparing versions of the software with and without each design 
feature. The first column lists the predictive accuracy for the 
software's nominal configuration. Column 2 lists the accuracy data 
for a slightly different generic tokenizer. Accuracy is higher for 
some domains, lower for others. A custom-built tokenizer is one way 
to incorporate knowledge about the domain. Columns 3 and 4 show the 
accuracy for the system using only the original two FSM merging rules 
(cf. Table 1) and all but the last merging rule (cf. Table 2), 
respectively. The decreased structural generality tends to lower 
predictive accuracy, but the embedded classifiers help compensate 
for the reduced accuracy. Column 5 lists the accuracy for when the 
FSM does not heuristically continue parsing upon encountering a token 
for which there is no immediate transition. As expected, accuracy 
suffers considerably in some domains because a novel token in a 
sequence completely foils any subsequent prediction. Columns 6 and 7 
list accuracy for different values of the free parameter controlling 
the clustering of notes together into a FSM. There is little effect 
on predictive accuracy in this case. Column 8 shows the accuracy for 
when embedded classifiers do not use information about repeated 
states in the FSM. Without this information, the classifiers cannot 
predict that a loop transition should be taken exactly once. 
Surprisingly, elimination of this feature has little effect on 
accuracy. Column 9 lists the accuracy for when the embedded 
classifiers associated with a pair of FSM states are discarded when 
the states are merged. Finally, Column 10 lists the accuracy for when 
a new FSM state is assigned a unique ID rather than the ID of the 
oldest of the two merged states.

7.3	Sample Button-Box Interfaces

In addition to Figure 3, Figures 11 through 15 depict button-box 
interfaces for the five other well-behaved note taking domains listed 
at the top of Table 12. These interfaces are visual and offer the 
user an organized view of their notes, presenting options in a 
natural way. However, whenever unique tokens are involved, the 
current software makes no attempt to explicitly generalize tokens. 
This effect is reflected in the tour dates for the Airwing notes in 
Figure 11. Note that the radio button panel consists of a long series 
of dates, none of which is likely to be selected for a new note.

8.	Related Work

Self-customizing software agents have several subjective dimensions 
on which they can be evaluated and compared:

*	Anticipation-Does the system present alternatives without 
the user having to request them?

*	User interface-Is the system graphical, or is it command-
line oriented?

*	User control-Can the user override or choose to ignore 
predictive actions?

*	Modality-If the system has a number of working modes, can 
the user work in any mode without explicitly selecting one of them?

*	Learning update-Is learning incremental, continuous and/
or real-time?

*	User adjustable-Can the user tune the system parameters 
manually?

Here we describe related systems that exhibit properties in each of 
these agent dimensions.

Our note taking software utilizes the anticipation user interface 
technique pioneered by Eager (Cypher, 1991). Eager is a non-intrusive 
system that learns to perform iterative procedures by watching the 
user. As such, it is a learning apprentice, a software agent, and an 
example of programming by example or demonstration. Situated within 
the HyperCard environment, it continuously watches a user's actions. 
When it detects the second cycle of an iteration, it presents an 
execute icon for the user's notice. It also visually indicates the 
anticipated next action by highlighting the appropriate button, menu 
item, or text selection in green. As the user performs their task, 
they can verify that Eager has learned the correct procedure by 
comparing its anticipations to their actions. When the user is 
confident enough, they can click on the execution icon, and Eager 
will run the iterative procedure to completion. Eager is highly 
anticipatory, uses a graphical interface, is non-obtrusive, non-
modal, and learns in real-time, but is not user adjustable.

CAP is an apprenticeship system that learns to predict default values 
(Dent, et al., 1992). Its domain of operation is calendar management, 
and it learns preferences as a knowledgable secretary might. For 
example, a professor may prefer to hold a regular group meeting in 
a particular room at a particular time of day for a particular 
duration-information that a secretary would know from experience. 
CAP collects information as the user manages their calendar, learns 
>from previous meetings, and uses the regularities it learns to offer 
default values for meeting location, time, and duration. The learning 
system is re-run each night on the most recent meeting data, and the 
learned rules are applied for prediction the following day. CAP is 
also designed to utilize an extensible knowledge base that contains 
calendar information and a database of personnel information. The 
system continues to be used to manage individual faculty calendars. 
Though offering some intelligence, CAP's user interface is line-
oriented and is based on the Emacs editor. Questions asked of the 
user about meetings are presented using a command-line dialog, and 
the default predictions are displayed one-at-a-time. CAP can be 
characterized as anticipatory, command-line oriented and modal with 
user control (but not user adjustable), where learning is done in 
batch.

Another related system addresses the task of learning to fill out a 
form (Hermens & Schlimmer, 1993). The system recreates a paper form 
as an on-screen facsimile, allowing the user to view all of the 
pertinent information at a glance. Input typed by the user into the 
electronic form is processed by a central form-filling module. When 
the user completes a form copy, it is printed, and each field value 
on the form is forwarded to a learning module (a decision tree 
learning method). The learned representations predict default values 
for each field on the form by referring to values observed on other 
fields and on the previous form copy. From the user's point of view, 
it is as if spreadsheet functions have been learned for each field 
of the form. Empirical studies indicate that this system reduced the 
number of keystrokes required of the user by 87% on 269 forms 
processed over the 8 month period in which it was actually used by 
office personnel. This system is unobtrusive, non-modal and 
anticipatory, uses a graphical interface, and updates learning in 
real-time.

Maes and Kozierok (1993) are addressing the problem of self-
customizing software at a much more task-independent level. They 
identify three learning opportunities for a software agent: 
observing the user's actions and imitating them, receiving user 
feedback upon error, and incorporating explicit training by the user. 
To illustrate the generality of their framework, they demonstrate 
simple learning apprentices that help sort the user's electronic mail 
and schedule meetings. Their initial systems use an instance-based 
(case- or memory-based) approach primarily because it allows 
efficient update and because it naturally generates a confidence in 
each of its predictions. User's may set thresholds on these 
predictions, corresponding to a minimum confidence for when the agent 
should prompt the user (a "tell-me" threshold) and a higher minimum 
confidence for the agent to act immediately on behalf of the user (a 
"do-it" threshold). The framework for learning in this case is 
anticipatory, utilizes a graphical user interface, is devoted to user 
control, is non-modal, learns in real-time, and is user adjustable.

A system developed for Macintosh Common Lisp (MCL) provides a word-
completion mechanism for word prefixes typed by the user in any 
window. J. Salem and A. Ruttenberg (unpublished) have devised MCL 
methods to display a word completion in the status bar of the each 
window. If the user desires to add the completion to the window, they 
simply press the CLEAR key. This word completion mechanism is similar 
to file-name completion in EMACS and the C-shell in UNIX systems, 
except that the word is displayed for the user before it is added. 
This system is anticipatory (unlike the UNIX file completion), is 
command line oriented (but displays the default completion in a 
graphical window), can be fully controlled by the user, is non-modal, 
learns in real time, is not intended to be user adjustable (though 
knowledgeable MCL programmers could easily make changes to the code).

The interactive note taking software we have devised does not require 
any user programming. It only receives implicit user feedback when 
the user chooses to complete a note in a different way than prompted. 
It does not have any mechanisms for direct user instruction or 
threshold tuning. In a system designed to be as easy to use as paper, 
such explicit adjustment may be inappropriate. We characterize our 
system as anticipatory, graphically-oriented, and modal (due to the 
switching that takes place when a user wishes to display the button-
box interface). It allows the user to override default prompts and 
predictions, and it learns in real-time. We have not included 
features that allow the user to configure the performance of the 
agent.

9.	Observations/Limitations

The interactive note-taking software is designed to help users 
capture information digitally, both to speed entry and improve 
accuracy, and to support the longer term goal of efficient retrieval. 
The software incorporates two distinctive features. First, it 
actively predicts what the user is going to write. Second, it 
automatically constructs a custom radio-button, check-box user 
interface.

This research explores the extremes of FSM learning and prediction, 
where the system has no explicit a priori knowledge of the note 
domains. We have tried to design the system so that it can learn 
quickly, yet adapt well to semantic and syntactic changes, all 
without a knowledge store from which to draw. It is clear that 
knowledge in the form of a domain-specific tokenizer would aid FSM 
learning by chunking significant phrases and relating similar 
notations and abbreviations. Some preliminary work has shown that, 
after a few notes have been written, users may create abbreviations 
instead of writing out whole words. A domain-specific tokenizer would 
be able to relate an abbreviation and a whole word as being in the 
same class, and therefore allow for more flexibility during note 
taking. For example, a domain-specific tokenizer may recognize that 
"Megabytes", "Meg", "MB", and "M" all represent the same token for 
memory sizes. One could imagine a framework that would allow for 
domain-specific tokenizers to be simply plugged in.

The prototype built to demonstrate these ideas was implemented on a 
conventional, micro computer with keyboard input. As a consequence, 
it was impossible to evaluate user acceptance of the new interface 
or the adaptive agent. With newly available computing devices 
incorporating pen input and handwriting recognition, it should be 
possible to re-engineer the user interface and field test these ideas 
with actual users.

One aspect of note learning, related to tokenization and the button-
box user interface display, is the difficulty of generalizing numeric 
strings or unique tokens. The cardinality of the range of model 
numbers, telephone numbers, quantities, sizes, other numeric values, 
and even proper names is very large in some note domains. The finite-
state machine learning method presented here is incapable of 
generalizing over transitions from a particular state, and, as a 
consequence, the current system has the problem of displaying a very 
lengthy button-box interface list. (A button is displayed for each 
value encountered in the syntax of notes, and there may be many 
choices.) For example, a large variety of pattern numbers may be 
available in the fabric pattern note domain. An appropriate mechanism 
is desired to determine when the list of numeric choices is too large 
to be useful as a button-box interface. The system can then 
generalize the expected number, indicating the number of digits to 
prompt the user: ####, for example. This may be helpful to remind 
the user that a number is expected without presenting an overbearing 
list of possibilities.

Another limitation of the current effort lies in the choice of 
finite-state machines to represent the syntax of the user's notes. 
Notes may not be regular expressions with the consequence that the 
FSMs may become too large as the learning method attempts to acquire 
a syntax. This may place an unreasonable demand on memory and lead 
to reduced prompting effectiveness.

The choice of finite-state machines also apparently constraints the 
custom user interface. Because FSMs branch in unpredicable ways, 
button-box interfaces must be rendered incrementally. After the user 
indicates a particular transition (by selecting a button), the system 
can render states reachable from that transition for the user. 
Ideally, the user should be able to select buttons corresponding to 
note fragments in any order, allowing them to write down the size 
before the pattern number, for example. To construct a non-modal user 
interface, a more flexible syntactic representation is needed.

Several of the low-level design decisions employed in this system 
are crude responses to technical issues. For instance, the decision 
to render a syntax as a button-box interface only if the average 
number of times each state has been used to parse notes is greater 
than 2. This ignores the fact that some parts of the state machine 
have been used frequently for parsing notes while other parts have 
rarely been used. Similarly, the particular measure for estimating 
prompting confidence (and setting the saturation of the completion 
button) is simplistic and would benefit from a more sound statistical 
basis.

Acknowledgments

Anonymous reviewers suggested an additional example in Section 3, 
offered some refinements to the user interface, graciously 
identified some limitations of the work listed in Section 9, and 
pointed out some additional related work. Mike Kibler, Karl Hakimian, 
and the EECS staff provided a consistent and reliable computing 
environment. Apple Cambridge developed and supports the Macintosh 
Common Lisp programming environment. Allen Cypher provided the 
tokenizer code. This work was supported in part by the National 
Science Foundation under grant number 92-1290 and by a grant from 
Digital Equipment Corporation.

References

Angluin, D. (1982). Inference of reversible languages. Journal of 
the Association for Computing Machinery, 29, 741-765.

Berwick, R. C., & Pilato, S. (1987). Learning syntax by automata 
induction. Machine Learning, 2, 9-38.

Bull, J., & Farrand, J., Jr. (1977). The Audubon Society Field Guide 
to North American Birds (Eastern Edition). NY: Alfred A. Knopf (pp. 
401-682).

Chilton's Repair & Tune-Up Guide: GM X-Body 1980-1985 (1985). Randor, 
PA: Chilton Book (p. 7).

Cohen, W. W. (1988). Generalizing number and learning from multiple 
examples in explanation based learning. Proceedings of the Fifth 
International Conference on Machine Learning (pp. 256-269). Ann 
Arbor, MI: Morgan Kaufmann.

Consumer Reports (1988), 53 (12), 302-303. Mount Vernon, NY: 
Consumers Union.

Cypher, A. (1991). Eager: Programming repetitive tasks by example. 
Proceedings of CHI (pp. 33-39). New Orleans, LA: ACM.

Dent, L., Boticario, J., McDermott, J., Mitchell, T., & Zabowski, D. 
(1992). A personal learning apprentice. Proceedings of the Tenth 
National Conference on Artificial Intelligence (pp. 96-103). San 
Jose, CA: AAAI Press.

Fisher, D. H. (1987). Knowledge acquisition via incremental 
conceptual clustering. Machine Learning, 2, 139-172.

Grove, M., & Miller, J. (1989). North American Rockwell A3J/A-5 
Vigilante. Arlington, TX: Aerofax (pp. 13-15).

Hermens, L. A., & Schlimmer, J. C. (1993). A machine-learning 
apprentice for the completion of repetitive forms. Proceedings of 
the Ninth IEEE Conference on Artificial Intelligence for 
Applications. Orlando, FL.

Laird, P. (1992). Discrete sequence prediction and its applications. 
Proceedings of the Tenth National Conference on Artificial 
Intelligence (pp. 135-140). San Jose, CA: AAAI Press.

Maes, P., & Kozierok, R. (1993). Learning interface agents. 
Proceedings of the Eleventh National Conference on Artificial 
Intelligence (pp. 459-465). Washington, D. C.: AAAI Press.

Nurse's Guide to Drugs (1979). Horsham, PA: Intermed Communications 
(pp. 454-462).

Schlimmer, J. C., & Fisher, D. H. (1986). A case study of incremental 
concept induction. Proceedings of the Fifth National Conference on 
Artificial Intelligence (pp. 496-501). Philadelphia, PA: AAAI Press.


Table 13: Percentage of tokens correctly predicted as a function of 
the learning method.

Figure 6: Sample finite-state machine after processing three fabric 
pattern notes.


Merge any two states if any of the following are true:

1.	Another state transitions to both states on the same 
token; or

	(This enforces determinism.)

2'.	Both states have a common 0-leader and

	a.	Both states are accepting states, or

	b.	Both states transition to a common state via the 
same token; or

3.	Both states have a common 1-leader and

	a.	Both states transition to a common state via any 
token, or

	b.	One transitions to the other via any token.

Table 2: Extended FSM state merging rules.


Decision tree embedded in State 3:

	If State 1 exited with "2048"

		Then predict " 20"

	Else if with "4096"

		Then predict " 40"	

	Else if with "6144"

		Then predict " 40"

	Else if with "8192"

		Then predict " 40" .

Decision tree embedded in State 7:

	If State 7 has not been visited

		Then predict " FAX"

	Else if State 7 exited with " FAX"

	Then predict " Modem" .

Table 3: Sample decision trees embedded in the finite-state machine 
depicted in Figure 7.


Table 14: Percentage of tokens correctly predicted as a function of 
design variations.


Table 12: Quantitative properties of the nine domains used to test 
alternative methods.

Figure 2: Screen snapshot of the note-taking software in contextual 
prompting mode for a PowerBook note. The two triangles in the lower 
left are scroller buttons.

Figure 3: Screen snapshot of the note-taking software in button-box 
mode for a PowerBook note.

Figure 7: Zero-reversible FSM characterizing PowerBook notes (cf. 
Example 1).

Figure 8: Zero-reversible finite-state machine characterizing fabric 
pattern notes learned using merging rules listed in Table 1.

Figure 9: Finite-state machine characterizing fabric pattern notes 
learned using extended rules in Table 2. Compare to zero-reversible 
finite-state machine for the same domain in Figure 8.

Figure 10: Simple finite-state machine with one state.

Figure 11: Screen snapshot of the note-taking software in button-box 
mode for an airwing note.

Figure 12: Screen snapshot of the note-taking software in button-box 
mode for a fabric pattern note.

Figure 13: Screen snapshot of the note-taking software in button-box 
mode for an engine code note.

Figure 14: Screen snapshot of the note-taking software in button-box 
mode for a minivan note.

Figure 15: Screen snapshot of the note-taking software in button-box 
mode for a watch note.

Figure 4: (a) Degenerate finite-state machine after processing a 
single fabric pattern note, and (b) prefix tree finite-state machine 
after adding a second fabric pattern note (cf. Example 2).

Figure 1: Continuum of software development depicting the 
traditional trade-off between the development cost per user and the 
amount of user customization required. Self-customizing software 
eliminates the need for user customization by starting with 
partially-specified software and applying machine learning methods 
to complete any remaining customization.


Stop expanding the prompt if any of the following are true:

1.	The next prediction is to terminate; or

2.	The next prediction is a generalized string; or

3.	At least one token has already been predicted and

	a.	The prediction starts with punctuation, or

	b.	The confidence of the prediction is lower; or

4.	The next prediction is the same as the last prediction; or

5.	More than 10 tokens have already been predicted.

Table 4: Stopping criterion for contextual prompting.

Figure 5: (a) Finite-state machine after processing two fabric 
pattern notes and applying state merging rules in Table 1, and (b) 
prefix tree finite-state machine after adding a third fabric pattern 
note.


A k-leader is defined as a path of length k that accepts in the given 
state.

Merge any two states if either of the following is true:

1.	Another state transitions to both states on the same 
token; or

	(This enforces determinism.)

2.	Both states have a common k-leader and

	a.	Both states are accepting states, or

	b.	Both states transition to a common state via the 
same token.

Table 1: FSM state merging rules from (Angluin, 1982).