Homework 4: Binary Search Trees



Due Monday, November 4 @ 8:00pm on Gradescope

1. Introduction

In this assignment, you’ll be writing the code for a binary search tree. It will be based on the tree that we construct and deal with in class. (So, it is ok to use code from our class demo code in your assignment.)

Download a pre-prepared Eclipse Project hw4.zip and import it into Eclipse using these instructions. You will make all of your changes inside the Java files in this project.

For this assignment you have 5 submissions. You should write your own testcases while you work so that you don’t waste submissions. After you have used your submissions, you may continue to submit, but your submission will be penalized 4 points for every extra submission. (So, your 6th submission receives a -4, your 7th submission a -8, etc.)

2. Your Tasks

You have two main tasks.

  1. Finish the KeyValueBinarySearchTree implementation with all of the requested methods.

  2. Write the AnagramTree class that can be used to find anagrams from a dictionary.

2.1. Finishing KeyValueBinarySearchTree

As the first step in this project, you will be building a key-value binary search tree. This is like a standard binary search tree, except that the order of the items in the tree is determined by a key associated with each item, instead of being determined by the item itself. Each node in the tree will store both the data for the node as well as the key associated with that data.

You will need to write the following methods in KeyValueBinarySearchTree.java.

public KeyValueBinarySearchTree()

The constructor.


public void add(KeyType key, DataType data)

Add an item to the tree. You should determine the appropriate node to put the item based on the key, and at the node store both the key and the data.


public int size()

Determines and returns the size of the tree (total number of nodes).


public boolean isEmpty()

Determine whether or not the tree is empty.


public int height()

Finds and returns the length of the longest path (number of edges) from the root to a leaf. Recall that the height of a 1-node tree (just the root) is 0 (as is the height of an empty tree) and the height of a node is the length of the longest path from that node to a leaf.


public boolean isBalanced()

Returns true if this tree is balanced, meaning that the height of the left subtree and the height of the right subtree of each node (not just the root) differ by no more than 1; returns false otherwise. An empty tree is (trivially) balanced.


public DataType find(KeyType key)

Searches the tree for the node represented by key and returns the data stored at the node.


public KeyValueBinarySearchTree<KeyType, DataType> clone()

Returns a new BST that is a shallow clone of the original, i.e., contains nodes with the same keys and data in exactly the same locations. The original tree should not be modified.


public ArrayList<DataType> asList()

Returns an ArrayList containing all of the data elements of the tree, sorted according to their keys.


private class TreeNode

The TreeNode class is a private inner class used (only) by this class


public String toString()

returns a String that prints tree top to bottom, right to left in a 90-degree rotated level view. Do not remove these toString methods, the autograder will make use of them.


2.2. Building the AnagramTree

Next, you are going to write an anagram finder, AnagramTree. (If you don’t know what an anagram is, try Wikipedia on the subject.)

The AnagramTree will make use of the KeyValueBinarySearchTree class to store lists of anagrams. At a high level, it will read in a file of words (one per line) and store them in a binary tree using their sorted letters as the search key. What does this mean? When you read a word from the file (a String), you must sort it (by creating another, sorted, String that has all the letters of the original word in sorted order) and then insert the original word into the tree using the sorted word as the key. The sorted word (a String) will be the search key for the binary search tree, and all the words that have the same sorted form (like “rats” and “tars” and “arts”) will all be stored in an ArrayList at the node with the sorted word key (in this case, with the key “arst”).

The AnagramTree method loadWords takes a file name and maximum word size and builds a tree with all the words that are less than or equal to the length specified. To do this, you read words from a file one line at a time, and for each word, if it is the correct length then construct its sorted form and add the original word into your tree, using the sorted form as the key. (Remember: The key finds the node, and the node contains an ArrayList of words that are anagrams of the sorted key.)

If you use the small file and a maximum word size of 7, you should have 16 words in 9 nodes in the tree. If you use the big file with a maximum word size of 7, you should have 51913 words in 41121 nodes. (Or, with a maximum word size of 8, 80314 words in 66538 nodes.)

We have provided some simple driver code in AnagramTester.java that should use your AnagramTree class. Your code must be compatible with this tester as provided.

You will need to write the following methods in AnagramTree.java.

public AnagramTree()

The constructor.


public void loadWords(String filename, int maxLen)

Reads in words of length <= maxLen and stores them in ArrayLists in the tree, indexed by the sorted form of the word.

This method does not have an explicit efficiency requirement, but if you are too inefficient then the autograder will timeout.


public boolean isEmpty()

Returns whether or not the tree is empty (has no nodes)


public int size()

Determines and returns the size of the tree (number of nodes) storing the anagrams.


public int numWords()

Return the total number of words that have been added to the tree.


public ArrayList<String> findMatches(String word)

Searches the tree given a word and returns a list of all the words that are anagrams of it.


3. Grading and Submission

There are multiple parts of the grading of this assignment:

  1. For the first 90 points, your submission will be auto-graded on Gradescope.

  2. For the next 5 points, your submission will be manually graded to check for good implementation methodologies. (Did you use a good approach to solving the problems?)

  3. For the next 5 points, your submission will be manually graded to check for good testcases that you include in the main method. (Do you have 2-3 basic testcases for each method, and do they all execute automatically?)

  4. Your code will also be checked for style. The parts of style that can be checked automatically (things like spacing, indentation, the use of CamelCase, etc.) are automatically checked by the autograder. Other parts of style, such as choosing good variable names, will be checked manually. Autograded style guide violations can result in, at most, -10 points. Manually checked style guide violations can result in, at most, -5 points.

You will submit your program to Gradescope. Log in to the system and you will see the homework. Once there, you need to submit a zip file containing your code. Lucky for you, however, Eclipse can create this zip file for you. Check out these instructions for exporting. On Gradescope, you’ll submit that exported zip file. On the page that follows your submission, you will see your live score (out of 90). If you receive a lower score than you expect, you can click on the score to see the feedback from the autograder.

3.1. Testcases

In this homework, we are not providing any testcases.

For KeyValueBinarySearchTree you need to provide at least two of your own testcases for each of the new methods. At least one of those should be non-basic. All of your testcases should be in the KeyValueBinarySearchTree class included with the skeleton code.

For AnagramTree you need to provide at least three of your own complete testcases that demonstrate overall functionality. (And, of course, we expect that you will also be testing manually using the AnagramTester).

You must follow the model of our testcases from previous assignments (meaning you print when you start, print the results (pass/fail) when you finish, etc.) Additionally, you must comment each testcase with a note describing what it tests.

When grading, in addition to counting testcases we will also look at the quality of what you are testing.

4. Restrictions

You may only use the following Java libraries in this assignment:

java.util.ArrayList
java.util.Scanner
java.io.FileNotFoundException
java.io.FileReader
java.util.Arrays

5. Important Notes