Classification | How many made? | How old is coin? | How much wear is on the coin? |
Positive | Rare | New | Low |
Positive | Rare | Old | Low |
Positive | Common | Old | Low |
Negative | Rare | Old | High |
Negative | Common | New | Low |
Negative | Common | New | High |
From the above table, it should be apparent that the wear on the coin contains the most information for the first classification, as the value for wear is "low" in all of the positive cases. The actual information calculation looks like this:
First, I(p/(p+n), n/(p+n)) can be calculated as I(3/6, 3/6) = -0.5*lg(0.5) - 0.5*lg(0.5) = 1. For the case of I(1, 0) (where 0 is the logarithmic singularity), I() goes to zero.
Next, the gain from each attribute can be computed as:
Gain(Rarity) = | 1 - [(3/6)*I(2/3, 1/3) + (3/6)*I(1/3, 2/3)] | = 0.0817 |
Gain(Age) = | 1 - [(3/6)*I(2/3, 1/3) + (3/6)*I(1/3, 2/3)] | = 0.0817 |
Gain(Wear) = | 1 - [(4/6)*I(3/4,1/4) + (2/6)*I(0,1)] | = 0.4591 |
As expected, we choose Wear to be the first branch of our decision tree. Since the test data both have high wear, they will be classified as Negative, meaning that they do not have much collector's value. The tree should branch again to handle any test cases that have a low wear attribute (using one of the remaining attributes); however, since there are no other possible data that are not already in the training or data sets, this will not be shown here. The next example will show that next step.
Classification | Bind type | Style of book | Color pictures? | Is the book well known? | Length of book |
Positive | Hardcover | Novel | Nocolor | Popular | Long |
Positive | Softcover | Textbook | Nocolor | Popular | Long |
Negative | Softcover | Novel | Nocolor | Popular | Short |
Positive | Hardcover | Textbook | Color | Popular | Short |
Positive | Hardcover | Photojournal | Color | Unknown | Short |
Negative | Softcover | Textbook | Nocolor | Unknown | Short |
Positive | Hardcover | Photojournal | Color | Popular | Long |
Negative | Softcover | Novel | Color | Unknown | Short |
The calculation of the gain from each attribute is just as it was above. First, I(p/(p+n), n/(p+n)) can be computed as -(5/8)*lg(5/8) - (3/8)*lg(3/8) = 0.954434
Next, each the gain from each attribute can be computed:
Gain(Bind) = | 0.954434 - [(4/8)*I(1, 0) + (4/8)*I(3/4, 1/4)] | = 0.54879494 |
Gain(Style) = | 0.954434 - [(3/8)*I(1/3, 2/3) + (3/8)*I(2/3, 1/3) + (2/8)*I(1, 0)] | = 0.265712 |
Gain(Color) = | 0.954434 - [(4/8)*I(2/4, 2/4) + (4/8)*I(3/4, 1/4)] | = 0.04879494 |
Gain(Popularity) = | 0.954434 - [(5/8)*I(4/5, 1/5) + (3/8)*I(1/3, 2/3)] | = 0.158868 |
Gain(Length) = | 0.954434 - [(3/8)*I(1, 0) + (5/8)*I(2/5, 3/5)] | = 0.34758988139 |
Based on these computed gains for each attribute, the bind is chosen as the first branch in the decision tree. Looking at the training data, a hardcover binding always leads to a Positive classification. To deal with the remaining softcover cases, we must eliminate the hardcover cases and start over.
Classification | Bind type | Style of book | Color pictures? | Is the book well known? | Length of book |
Positive | Softcover | Textbook | Nocolor | Popular | Long |
Negative | Softcover | Novel | Nocolor | Popular | Short |
Negative | Softcover | Textbook | Nocolor | Unknown | Short |
Negative | Softcover | Novel | Color | Unknown | Short |
Here, I(p/(p+n), n/(p+n)) = -(1/4)*lg(1/4) - (3/4)*lg(3/4) = 0.8112781
As above, the gains for the remaining attributes is computed:
Gain(Style) = | 0.8112781 - [(2/4)*I(1/2, 1/2) + (2/4)*I(0, 1)] | = 0.3112781 |
Gain(Color) = | 0.8112781 - [(3/4)*I(2/3, 1/3) + (1/4)*I(0, 1)] | = 0.1225562 |
Gain(Popularity) = | 0.8112781 - [(2/4)*I(1/2, 1/2) + (2/4)*I(0, 1)] | = 0.3112781 |
Gain(Length) = | 0.8112781 - [(1/4)*I(1, 0) + (3/4)*I(0, 1)] | = 0.8112781 |
From these new gains, it should be obvious that the length of the book is the next branch in our tree. As it turns out, out of all of the training data, short books get a negative classification and long books get a positive classification. Since all data are classified at this level, no further decision branches are needed. If there were any cases that were not categorized by this tree, then this process would repeat.
Now, consider the test data:
Softcover | Photojournal | Color | Popular | Short |
Hardcover | Novel | Nocolor | Unknown | Long |
Hardcover | Textbook | Nocolor | Unknown | Long |
The first book, as a softcover, will be categorized by its length. Since it is short, its classification will be negative. The second and third books, as hardcovers, will have positive classifications.