AI Assignment 5: Sample Input

Problem 2

Sample Files: Test files:


Coins:

We need to determine which coins are valuable for collectors, and we view valuable coins as positive examples; the training set includes six examples:
 
class rarity age wear
positive rare new low
positive rare old low
positive common old low
negative rare old high
negative common new low
negative common new high

We compute the information gain of each attribute:
 
Gain(rarity) = I(3/6, 3/6) - (3/6) * I(2/3, 1/3) - (3/6) * I(1/3, 2/3) = 0.082
Gain(age) = I(3/6, 3/6) - (3/6) * I(2/3, 1/3) - (3/6) * I(1/3, 2/3) = 0.082
Gain(wear) = I(3/6, 3/6) - (4/6) * I(3/4, 1/4) - (2/6) * I(0,1) = 0.459

The "wear" attribute provides the greatest gain, and we use it for the root split:

All training examples with high wear are negative; since both test instances have high wear, they are also classified as negative:
 
rarity age wear class
rare new high negative
common old high negative

The coins with low wear require an additional split, but this split does not affect the classification of the given test instances.


Books:

We next learn to distinguish between cheap and expensive books, and we consider expensive books as positive examples. The training set includes eight books described by five attributes:
 
class bind style pictures popularity length
positive hardcover novel nocolor popular long
positive softcover textbook nocolor popular long
negative softcover novel nocolor popular short
positive hardcover textbook color popular short
positive hardcover journal color unknown short
negative softcover textbook nocolor unknown short
positive hardcover journal color popular long
negative softcover novel color unknown short

We first determine the information gain of each attribute:
 
Gain(bind) = I(3/8, 5/8) - (4/8) * I(1, 0) - (4/8) * I(3/4, 1/4) = 0.549
Gain(style) = I(3/8, 5/8) - (3/8) * I(1/3, 2/3) - (3/8) * I(2/3, 1/3) - (2/8) * I(1, 0) = 0.266
Gain(pictures) = I(3/8, 5/8) - (4/8) * I(2/4, 2/4) - (4/8) * I(3/4, 1/4) = 0.049
Gain(popularity) = I(3/8, 5/8) - (5/8) * I(4/5, 1/5) - (3/8) * I(1/3, 2/3) = 0.159
Gain(length) = I(3/8, 5/8) - (3/8) * I(1, 0) - (5/8) * I(2/5, 3/5) = 0.348

The "bind" attribute gives the greatest gain, and we use it for the root split. All training examples with "hardcover" are positive, whereas the "softcover" examples require an additional split:

We show the "softcover" examples in the following table.
 
class bind style pictures popularity length
positive softcover textbook nocolor popular long
negative softcover novel nocolor popular short
negative softcover textbook nocolor unknown short
negative softcover novel color unknown short

We next compute the information gains of the remaining attributes for the "softcover" examples:
 
Gain(style) = I(1/4, 3/4) - (2/4) * I(1/2, 1/2) - (2/4) * I(0, 1) = 0.311
Gain(pictures) = I(1/4, 3/4) - (3/4) * I(2/3, 1/3) - (1/4) * I(0, 1) = 0.123
Gain(popularity) = I(1/4, 3/4) - (2/4) * I(1/2, 1/2) - (2/4) * I(0, 1) = 0.311
Gain(length) = I(1/4, 3/4) - (1/4) * I(1, 0) - (3/4) * I(0, 1) = 0.811

The "length" attribute provides the greatest gain, and we use it for the next split. The resulting tree shows that hardcover books and long softcover books are expensive, whereas short softcover books are cheap:

This tree leads to the following classification of the test instances:
 
bind style pictures popularity length class
softcover journal color popular short negative
softcover novel nocolor unknown long positive
hardcover textbook nocolor unknown short positive

Days

The task is to distinguish Sunday from Monday; Sunday is positive, and Monday is negative.

Weather

The attributes are a state, month, and city, and the task is to determine whether a traveler needs a sweater when visiting this city.

Food

We need to decide whether to buy a hamburger; the attributes include the dollar cost of the hamburger, place (Checkers, Burger King, or McDonald's), and time.

Back to the AI home page
.