Program 9

Advanced Programming/Practicum
15-200

This programming assignment is designed to exercise your skills in writing tree processing methods (often using recursion). There are three parts in this assignment. I will briefly describe each part now, and describe each part in more detail below.

Write a class named HeapPriorityQueue that implements the OrderedCollection interface using priority queue semantics (any-in/highest priority out). The underlying data structure is a heap, which is efficient in time -its add/remove operatons are both O(Log₂ N)- and which uses an array to represent a tree compactly -parent/child references are implicit (calculated) from the nodes array index.
Write a program that given a size N, computes statistics about the average height of a binary search tree containing that many nodes. Using information collected from running this program, guess at a formula for the average height of binary search tree of size N.
Write solutions to all the binary search problems that may appear on the second in-class programming exam. You will test these methods with various driver programs (a different one for testing each method), each works by reading its associated data files, building a tree, and running the method on it; I have provided some data files; you can create your own data files as well.

We are already familiar with priority queues (we have implemented them with both arrays -in two different ways- and with linked lists). In this implementation we will instead use a heap, which has better (more balanced) complexity classes for enqueuing and dequeuing methods. For simple array and linked list implementations, one of these methods is always in the complexity class O(1) and the other is in O(N); for heaps both operations are in the complexity class O(Log₂N).

In the program for computing average tree heights, the user will specify N, the size of the tree, and M, the number of times to test it (e.g., 1000 node trees tested 1,000,000 times: still a small fraction of the 1000! possible permutations of these values). Each test will permute the the values 1-N and sequentially insert them into a binary tree and collect its statistics. Eventually, a variety of information is printed, including a histogram of all the tree heights (within the computed minimum and maximum bounds).

By writing solutions to the problems that may be on the second in-class programming exam now, you will be ahead of the curve studying for them.

There are no executables for this assignment: HeapPriorityQueue behaves just like any implementation of a priority queue (use the standard driver to test it); in the details of the tree height program, we will see all necessary output; for the second in-class programming exam programs, you should be able to inspect visually the results to determine whether they are correct.

Instead of starting with empty project folders, download and unzip the Start Project Folder which contains project folders for the last two parts of this assignments. Also, download the Ordered Collections Demonstration, as you did in the previous program, and write your HeapPriorityQueue within it (the driver now has an option T for testing N adds followed by N removes, reporting the elapsed time. Put your project folders in a folder whose name combines your names (when programming in pairs) and the program number (e.g., pattis-stehlik-9). Then zip this folder and dropoff that single zip file. If you are programming in a pair you should submit only one project: either partner can submit it.

Heap Priority Queue

The HeapPriorityQueue class must implement the OrderedCollection interface using standard priority queue semantics (any-in/highest priority out). You have already implemented this behavior using simple arrays and linked lists. Here, the underlying data structure is a heap, which uses an array to represent a tree compactly. In this representation, the index of a child node can be computed from the index of its parent node, and vice versa.

Use instance variables to store the array (holding the heap; double it when it must accomodate more values), how much of that array is used (or the location of the last value), and an object that knows how to prioritize/compare values. Note that this last object is supplied to the constructor; the object must come from a class implementing the Comparator interface.

The iterator must return values from the highest to the lowest priority (e.g., in decreasing priority order). For the iterator, you can make a copy of the heap being iterated over, and then use the copy when calling hasNext and next. This approach works with a copy, because remove is unsupported. Be careful though, that you don't try to use the iterator when constructing the iterator.

Test your methods with the standard driver (you must again modify it a bit) that tests all classes implementing the OrderedCollection interface.

Average Height

In this program, the user will be prompted to specify N, the size of the tree, and M, the number of times to test it. Each test will permute the the values 1-N and sequentially insert them into a binary tree; hint: use a List for this and the Collections.shuffle method. Then it will collect statistics for this tree. Finally, a variety of statistics are printed, including a histogram of tree heights (within the computed minimum and maximum bounds). Here is an example of me running my program (yours should provide an identical interaction).

Enter size of tree to create : 1000 Enter # of random trees to create: 100000 Enter # trees per heartbeat : 10000 Collecting Statistics: 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 Height | #Trees | ----------------------------------------------------------------------- 16 | 38 | * 17 | 903 | *** 18 | 5,750 | ************** 19 | 14,434 | ********************************** 20 | 21,388 | *************************************************** 21 | 21,158 | ************************************************** 22 | 15,900 | ************************************** 23 | 9,945 | ************************ 24 | 5,614 | ************** 25 | 2,714 | ******* 26 | 1,267 | *** 27 | 547 | ** 28 | 213 | * 29 | 94 | * 30 | 22 | * 31 | 10 | * 32 | 2 | * 33 | 0 | * 34 | 1 | * Minimum Height Possible = 8.967226258835993 Minimum Height = 16 Average Height = 21.04373 Maximum Height = 34 Maximum Height Possible = 999 Average Depth for Search = 11.9 Time = 71.094 seconds; processing speed = 1406 trees/second

Here I specified trees containing 1,000 nodes. For this run, I generated 100,000 of them. I printed a heartbeat for every 10,000 trees generated and analyzed.

The histogram goes from 18 (the minimum height tree generated) to 45 (the maximum height tree generated). The program also prints these numbers, along with the Mininum Height possible as well as the Maximum Height possible. In addition the program prints the Average Height and the Average Depth for Search. The former is just the average of all the heights of the trees generated; the later is a weighted average: all nodes at dept 0 (the root) require 1 comparision in a search; all nodes at depth 1 require two comparisions in a search; all nodes at depth 2 require three comparisions in a search; etc. In a pathological 3 node tree, the average search depth is (1*1 + 1*2 + 1*3)/3 = 2; in a perfect 3 node tree, the average search depth is (1*1 + 2*2)/3 = 5/3 < 2 (so searching for a random value in a perfect tree, as we expect, is faster on average).

Start timing the process right after you print Collecting Statistics:; stop timing right before printing the histogram. This stars on the histogram should be scaled so that the largest count prints as 50 stars. I printed a minimum of one star for every height that generated. See the DecimalFormat class to print numbers in the # of Trees column nicely (you do not have to use this class, in which case you can print these numbers a bit more ackwardly). Or, use the C-style formating in Java 1.5 to solve this problem more easily. You should be able to collect statistics using tree sizes in the millions (generating thousands of random trees) and tree sizes in the thousands (generating millions of random tree): this might entail using BigInteger and/or BigRational: think about what numbers might exceed a few billion.

IMPORTANT: Using information collected from running this program on various sized trees, guess an empirical formula for the average height of binary search tree of size N and for the average depth for search. Put these formulas at then end of the header comment in your program.

Problems for Binary Search Trees

Write solutions to all the binary search problems that may be on the second in-class programming exam. You will test these methods with various driver programs (a different one for testing each method), each works by reading its associated data files, building a tree, and running the method on it; I have provided some data files; you can create your own data files as well. Each as a short implementation that uses a private helper method (which is often a small recursive method). Explore different solutions in search of simple ones.

Test your methods with the driver, which already has about a dozen input files. You can write your own input files as well: just write any file in the input directory (in the right format) and the driver program will automatically find it and allow you to run it.

Program 9

Advanced Programming/Practicum 15-200

Advanced Programming/Practicum
15-200