AI Assignment 5: Sample Input

Problem 3

Sample files

Test files

Coins

The file contains six training values and two test values. These values correspond to whether a coin will be valuable to a collector, and the fields correspond to the following table:
Classification How many made? How old is coin? How much wear is on the coin?
Positive Rare New Low
Positive Rare Old Low
Positive Common Old Low
Negative Rare Old High
Negative Common New Low
Negative Common New High

From the above table, it should be apparent that the wear on the coin contains the most information for the first classification, as the value for wear is "low" in all of the positive cases. The actual information calculation looks like this:

First, I(p/(p+n), n/(p+n)) can be calculated as I(3/6, 3/6) = -0.5*lg(0.5) - 0.5*lg(0.5) = 1. For the case of I(1, 0) (where 0 is the logarithmic singularity), I() goes to zero.

Next, the gain from each attribute can be computed as:
Gain(Rarity) = 1 - [(3/6)*I(2/3, 1/3) + (3/6)*I(1/3, 2/3)] = 0.0817
Gain(Age) = 1 - [(3/6)*I(2/3, 1/3) + (3/6)*I(1/3, 2/3)] = 0.0817
Gain(Wear) = 1 - [(4/6)*I(3/4,1/4) + (2/6)*I(0,1)] = 0.4591

As expected, we choose Wear to be the first branch of our decision tree. Since the test data both have high wear, they will be classified as Negative, meaning that they do not have much collector's value. The tree should branch again to handle any test cases that have a low wear attribute (using one of the remaining attributes); however, since there are no other possible data that are not already in the training or data sets, this will not be shown here. The next example will show that next step.

Books

This file contains eight training data and three test data, and the attributes refer to whether a book will be expensive at the local bookstore.
Classification Bind type Style of book Color pictures? Is the book well known? Length of book
Positive Hardcover Novel Nocolor Popular Long
Positive Softcover Textbook Nocolor Popular Long
Negative Softcover Novel Nocolor Popular Short
Positive Hardcover Textbook Color Popular Short
Positive Hardcover Photojournal Color Unknown Short
Negative Softcover Textbook Nocolor Unknown Short
Positive Hardcover Photojournal Color Popular Long
Negative Softcover Novel Color Unknown Short

The calculation of the gain from each attribute is just as it was above. First, I(p/(p+n), n/(p+n)) can be computed as -(5/8)*lg(5/8) - (3/8)*lg(3/8) = 0.954434

Next, each the gain from each attribute can be computed:
Gain(Bind) = 0.954434 - [(4/8)*I(1, 0) + (4/8)*I(3/4, 1/4)] = 0.54879494
Gain(Style) = 0.954434 - [(3/8)*I(1/3, 2/3) + (3/8)*I(2/3, 1/3) + (2/8)*I(1, 0)] = 0.265712
Gain(Color) = 0.954434 - [(4/8)*I(2/4, 2/4) + (4/8)*I(3/4, 1/4)] = 0.04879494
Gain(Popularity) = 0.954434 - [(5/8)*I(4/5, 1/5) + (3/8)*I(1/3, 2/3)] = 0.158868
Gain(Length) = 0.954434 - [(3/8)*I(1, 0) + (5/8)*I(2/5, 3/5)] = 0.34758988139

Based on these computed gains for each attribute, the bind is chosen as the first branch in the decision tree. Looking at the training data, a hardcover binding always leads to a Positive classification. To deal with the remaining softcover cases, we must eliminate the hardcover cases and start over.
Classification Bind type Style of book Color pictures? Is the book well known? Length of book
Positive Softcover Textbook Nocolor Popular Long
Negative Softcover Novel Nocolor Popular Short
Negative Softcover Textbook Nocolor Unknown Short
Negative Softcover Novel Color Unknown Short

Here, I(p/(p+n), n/(p+n)) = -(1/4)*lg(1/4) - (3/4)*lg(3/4) = 0.8112781

 As above, the gains for the remaining attributes is computed:
Gain(Style) = 0.8112781 - [(2/4)*I(1/2, 1/2) + (2/4)*I(0, 1)] = 0.3112781
Gain(Color) = 0.8112781 - [(3/4)*I(2/3, 1/3) + (1/4)*I(0, 1)] = 0.1225562
Gain(Popularity) = 0.8112781 - [(2/4)*I(1/2, 1/2) + (2/4)*I(0, 1)] = 0.3112781
Gain(Length) = 0.8112781 - [(1/4)*I(1, 0) + (3/4)*I(0, 1)] = 0.8112781

From these new gains, it should be obvious that the length of the book is the next branch in our tree. As it turns out, out of all of the training data, short books get a negative classification and long books get a positive classification. Since all data are classified at this level, no further decision branches are needed. If there were any cases that were not categorized by this tree, then this process would repeat.

Now, consider the test data:
Softcover Photojournal Color Popular Short
Hardcover Novel Nocolor Unknown Long
Hardcover Textbook Nocolor Unknown Long

The first book, as a softcover, will be categorized by its length. Since it is short, its classification will be negative. The second and third books, as hardcovers, will have positive classifications.

Days

The task is to distinguish Sunday from Monday; Sunday is positive, and Monday is negative.

Weather

The attributes are a state, month, and city, and the task is to determine whether a traveler needs a sweater when visiting this city.

Food

We need to decide whether to buy a hamburger; the attributes include the dollar cost of the hamburger, place (Checkers, Burger King, or McDonald's), and time.

Back to the AI home page