Introduction |
In this lecture we will continue our study of self-referential classes
by examining trees.
Like linked lists, trees contain nodes: these nodes are objects instantiated
from a class that contains instance variables that refer to other nodes from
this same class.
Whereas references in the linked list class indciate a "follows" relationship
(and in the case of doubly-linked lists also a "precedes" relationship),
references in tree classes indicate an inclusion relationship (where a
parent node includes all its children nodes): these relationships are much
more interesting in terms of the kinds of information that they can
represent.
Although we will first examine general tree structures, we will focus most of our attention in this lecture and the next on defining and processing binary trees. Within this category we will soon see examples of ordered (search) trees and structure (expression) trees. We will use ordered search trees primarily to store collections of values that can be searched quickly (bringing O(Log2N) searching to self-referential classess, just as we did for arrays). In the final tree lecture we will examine another kind of ordered tree (a heap) and it relation to implementing a priority queue with "fast" enqueue/dequeue operations, as well as other special kinds of trees (N-ary trees, structure/expression trees, and digital trees). Again, in 15-200 we are just scratching the surface of the topic of trees. 15-211 provides a more extensive study of this topic, which is very important in Computer Science. |
Terminology |
All kinds of trees illustate one important relationship: inclusion between
parts and a whole; another way to describe this relationship is that between
a parent node that includes child nodes.
Every child node has a unique parent; every parent node can have any number of
children (including none).
As in trees used in geneology, we will write each parent node directly above
its child(ren) node(s).
In fact, we will use other geneological terms, like ancestor and descendant,
when describing nodes in a tree.
We draw lines between parent/child nodes to illustrate their direct
relationship.
There is one unique node in every tree: this node has no parent and is called the root of the tree; because all other nodes in the tree are its descendants, we write the root node at the top of the tree. A mutually exclusive way to classify tree nodes is as internal or leaf. An intenal node has one or more children; a leaf node has no children. So, any node that is a parent is an internal node; a node that is only a child (not a parent to another child) is a leaf node. Finally, we define the size of a tree as the number of nodes that it contains (similarly to the length of a linear linked list); we define the height of a tree as the length of the longest path (each line counts as one step) from a root to one of its descendants. Alternatively, we can define the depth of a node as the number of ancestors it has, and then define the height of a tree as the largest depth of any of its nodes. Note that the root is at depth 0, because it has no ancestors; a tree consisting solely of a root also has a height of 0. The concepts of size and height for trees generalize the length of linear linked list. We have already used trees to represent inheritance hierarchies: the relationship between classes (parents) and subclasses (children). In the bouncing ball program, we used the following tree to illustrate the inheritance hierarchy of most of its model classes. |
  |
Let's state some facts about this tree using some of the terminology defined
above.
|
A Class for Defining Binary Trees |
In this section we will begin our detailed study of trees by examining binary
trees.
A binary tree has at most two children (each node has 0 children -a leaf node-
or has 1 or 2 children -an internal node).
We can define a class to construct objects/nodes for such trees as
public class TN { public int value; public TN left,right; public TN (int i, TN l, TN r) {value = i; left = l; right = r;} }In the standard definition of a binary tree, a parent node refers to each of its two (left and right) subtrees (which can be null or refer to child nodes that themselves are trees). Of course, the null reference denotes an "empty" tree (one with no nodes), just as it denotes and empty list. As in doubly-linked lists, we can extend such a class to also include a "parent" reference instance variable. But, such references are often not worth the trouble to implement and maintain, and we will do without them (just as we did without "previous" references in our study of linked lists. In classes that implement collections via trees, we typically declare an instance variable named root that stores null or a reference to the root of a tree (and use it just as we used front when storing collections in a linked list). |
Recursive Methods for Computing Size and Height |
In this section we start relating some terminological concepts that we learned
to recursive methods that operate on the binary trees defined in the
previous section.
We can write a very simple recursive method for computing the size of a binary
tree; it is simlar to (and generalizes) the recursive method that we studied
to compute the length of linked list.
public int size (TN t) { if (t == null) return 0; else return 1 + size(t.left) + size(t.right); }Note that here (and in many other recursive methods operating on binary trees) we write two recursive calls: one to compute the size of the left subtree and one to compute the size of the right subtree. We can prove that this method is correct as follows.
Here is a iterative method that uses a stack to compute the size of a tree public int size (TN t) { AbstractStack s = new ArrayStack(); int size = 0; s.add(t); while (!s.isEmpty()) { TN next = (TN)s.remove(); if (next != null) { size++; s.add(next.left); s.add(next.right); } } return size; } We can also write a recursive method to compute the height of tree. First, we will do so in an intuitive manner; then we will write a smaller and simpler to understand method using a bit more sophistication. Note that height of a (sub)tree that is a leaf node is just 0. Also note that the height of an internal node is 1 more than the biggest height of its subtrees. Using these facts we can write the following recursive method to compute the height of any non-empty tree. public int height (TN t) { if (t.left == null && t.right == null) //leaf check return 0; else if (t.left == null) return 1 + height(t.right); else if (t.right == null) return 1 + height(t.left); else return 1 + Math.max(height(t.left),height(t.right)); }This method deals with all the necessary cases: a leaf node, an internal node with only a left (or only a right) subtree, and an internal node with both left and right subtrees. This method does not work on empty trees, which have no directly defined height from the previous definition. Now, let us simplify this code by defining the height of an empty tree to be -1. In one case this seems very strange, but in another it seems obvious: an empty tree should have a height that is one less than a leaf node (whose height is 0). By using this definition (and no others), we can simplify the height method (as well as defininig it for all possible trees, even empty ones) into the elegant method below. public int height (TN t) { if (t == null) return -1; else return 1 + Math.max(height(t.left),height(t.right)); }Again, if t is a leaf node, then its left and right subtrees are empty, so this method would preform the recursion and return 1 + Math.max(-1,-1) which returns 0 (the correct answer for a leaf node). So, using this generalization of height, our code is simpler and always works (no matter whether an empty or non-empty tree is passed as a parameter; in the earlier method, passing an empty tree has a parameter would cause Java to throw a NullPointerException when it tried to determine if the node was a leaf). Mathematicians generalize definitions such as this one all the time. You may or may not know that for a non-zero a, a0 is defined as 1. There are many ways to justify this definition (some quite complicated); the simplest way is to note the algebraic law axay = ax+y. By this law (a quite useful one to have) a0ax = a0+x = ax; which means that a0 must be equal to 1 for this identity to hold. |
Mathematics Size/Height Relationships | We can use the structure of binary trees to derive some mathematical relationships between their sizes and heights. First, we should reiterate that the "inclusion" relationships modeled by trees is much more interesting than the "follows" relationship that is modeled by linear linked lists. One way to illustrate the difference in "interestingness" is by examining all structurally different (different looking) linked lists containing 4 nodes, independent of the values they store: there is only one. |
  | In contrast, here is a listing of all the structural different binary trees containing 4 nodes (i.e., of size 4) |
  |
In a more mathematically advanced class, we could deduce a formula that
computes the number of structurally different trees containing N nodes (this
is similar to computing the number of isomers of a chemical molecule).
We define a pathological tree as one with only one node at each depth (all the ones on the bottom). In all pathological trees, we have height = size-1. At the other end of the spectrum is a perfect tree, in which every depth is filled with as many nodes as possible (none of the trees above satisfy this criteria). The picture below shows perfect trees of height 0, 1, 2, and 3. |
  |
If we tabulate this data, we have the following information characterizing
the height and size of perfect trees.
If we study and extend this table, we can guess a simple but interesting relationship between the height of a perfect tree and its size: size = 2height+1-1. First, verify that this formula is correct for the heights/sizes shown. Now, let's prove it by induction.
Rewriting this equality to express height as a function of size, we have, height = Log2(size+1) - 1. Now, we can also write the original formula as size = 2(2height)-1; removing the multiplicative and additive constants, we have size is O(2height). Or, solving for height, we have height is O(Log2size). In the next lecture we will learn that the complexity class for searching an ordered binary tree is related to its height; for perfect trees the complexity class is O(Log2size). If we can keep our binary trees reasonably full, we will be able to search them in the same complexity class as searching sorted arrays (same for adding and removing elements -which was not true for ordered lists-, while keeping the ordered property). |
Problem Set |
To ensure that you understand all the material in this lecture, please solve
the the announced problems after you read the lecture.
If you get stumped on any problem, go back and read the relevant part of the lecture. If you still have questions, please get help from the Instructor, a CA, or any other student.
|