15-121 Homework 6: BSTs and Anagrams - Due 11/19 at midnight
Download and unzip hw6-code.zip, which contains
all the files you will need for this assignment. You will be writing code in
a number of files, including creating one class from the ground up.
The goals of this assignment are:
- to give you additional practice with recursion and BST operations
- to modify the existing BST class in an interesting way
- to write a class from scratch
- to build a class you might actually use in word games
Note: You will be graded in part on your coding style. Your code
should be easy to read, well organized, and concise. You should avoid
duplicate code.
Background: The Assignment
This assignment has two parts. The first is to write (and
test) three Binary Search Tree methods: height,
isBalanced, and mirrorTree. These will be added to the
BST.java file. In addition, you're going to write an anagram generator
as was demo'd in class. In order to do this, you're going to have to augment
the same BST class to enable it to handle values with duplicate keys as
well as write a class called AnagramTree that constructs a tree of
words that can be searched for anagrams.
Part I — The BST Methods
There are three new methods that you will be writing for the BST class
in this homework assignment. You should test your methods by calling them
from the main method in the BST class. You will only receive
credit for a method if it works completely correctly, so test
thoroughly!
You are going to need recursion and helper methods to do this homework!! :-)
The three new BST methods are as follows:
public int height()
public boolean isBalanced()
BST<AnyType> mirrorTree()
Specifications for what each method is supposed to do are given below. Again,
you are
strongly encouraged to test your
BST methods
before moving on to the second part of this homework.
int height()
Returns the length of the longest path (number of edges) from the root to a
leaf. Recall that the height of a 1-node tree (just the root) is 0 (as
is the height of an empty tree) and the height of a node is the length of the
longest path from that node to a leaf.
boolean isBalanced()
Returns true if this BST is balanced, meaning that the height of the
left subtree and the height of the right subtree of each node (not just the
root) differ by no more than 1; returns false otherwise. An empty tree is
(trivially) balanced.
BST<AnyType> mirrorTree()
Returns a new BST that is the mirror of the original, i.e., all the
nodes that were initially in the left subtree of a node in the original tree
are now in the right subtree of that node in the mirrorTree. Note that this
effectively "flips" the sort order of the BST since all the smaller
nodes (which were on the left) will now be on the right. So the mirrorTree
will be sorted in ascending order (which you can test by doing an inorder
traversal of the mirrorTree).
Part II — The Anagram Generator
You're going to write an anagram generator, AnagramTree, as was demo'd
in class. The AnagramTree will make use of the BST class
to read in a file of words (one per line) and store them in a binary tree
using their sorted letters as the search key. What does this mean? When you
read a word from the file (a String), you must sort it (by creating another,
sorted, String that has all the letters of the original word in sorted order)
and then insert both the sorted word and the original word in the tree. The
sorted word (a String) will be the search key for the binary search tree, and
all the words that have the same sorted form (like "rats" and "tars" and
"arts") will all be stored in the same node in a list, with key "arst"). The
reason for doing this is that anagrams are words that have the same letters,
just rearranged. So you'll take a word, put it in a standard
or canonical form (that would be the sorted form), and any two words
that have the same canonical form must be anagrams! Then all you have to do
is print the list of words that have the same canonical form.
Process
BST.java
You will need to add two more methods (in addition to those specified in Part
I) to BST.java as well as modify the private TreeNode class to
both declare and create a list of AnyType to hold the words with the
same canonical form. The two new methods will need to interact with this
list. One is an overloaded add method that takes the sorted word
(AnyType) and the original word (AnyType) as parameters and
inserts the sorted word in the tree (as the key value) and the original word
in the list (of AnyType) that belongs to that key. Of course, as you
write the helper function for this overloaded version of add, you
will have to handle the case where the key is already in the tree a little
differently than was done in lecture since it is no longer an error!
The second method to add to BST.java is a find method that
takes a sorted word (AnyType), determines if it is in the tree as
a key and, if it is, returns the list (of AnyType, but really of
Strings that are original words) that map to that key so you can print it
out.
AnagramTree.java
Once you've got the BST modifications done, it's time to get to work
creating anagrams. You can have your user interface behave any way you want
(I will provide the output of my program below: your results have to match,
but your program's interaction with the user does not have to mimic mine
line-by-line).
You want to have all the words with the same letters be stored in the same
location. So you will use the sorted form of a word as the key, and associate
that key with all the words that have that same sorted form. Thus, "tar" will
be equivalent to "tar, art, rat" (in any order).
As you can see from the driver code, you will build the tree by asking the
user what file they want to read from and the maximum word size that they
want to consider and then construct an AnagramTree by opening,
reading, and inserting all the words from that file. You should only store
the words that are less than or equal to the maximum length provided by the
user. To help in debugging, I have provided you with two dictionary
files: small-words-qatar.txt and words.txt. The files are quite
differently sized: small-words-qatar.txt contains 25 words,
while words.txt contains over 172,000 words! You should test your code
on the small file before going to the big one!
The AnagramTree constructor takes a file name and maximum word size and
builds a tree with all the words that are less than or equal to the length
specified. To do this, you read a string and if it's the right length,
construct its sorted form, and then add the sorted form and original word into
your BST. If, when you're done reading the small file, you read 26
words, and with a maximum word size of 7, you inserted 16 of them with 9 nodes
in the tree, you appear to be on the right track. If you then run on the big
file and get 51913 words inserted and 41121 nodes with a max word size of 7
and 80314 words and 66538 nodes with a max of 8, you have built a correct
tree.
Once you've got your tree built, the driver code now asks the user for a word
and, if its length is less than or equal to your max length, searches for it
in the tree (by calling findMatches, using what as the key?) and, if
found, print all the anagrams of that word (which will be found in the list
that the key returned as its value attribute). If the word is not found in
the tree, you should tell the user that it was not found. The user should be
allowed to search for as many individual words as they want until they enter a
sentinel value, at which point the program should end. As an example, if you
are using the small file and the user enters "tar", your program should print
"tar rat art" (in any order) as your output. Once you can do that, you're
done!
AnagramTester.java
One final note: do NOT make any changes to AnagramTester.java.
Expected Output
The following was the output of my reference solution, using 7 as the maximum
word size on the large dictionary (recall that your words can appear in any
order but there can be no duplicate words printed):
Enter name of dictionary file: words.txt
Max word length: 7
Total number of words read: 172715
Number of words inserted (length <= 7): 51913
Number of nodes in the tree: 41121
string to search [#] to stop: chart
Words that match: [ratch, chart]
string to search [#] to stop: star
Words that match: [tsar, rats, arts, star, tars]
string to search [#] to stop: hoser
Words that match: [heros, hoers, shoer, shore, horse]
string to search [#] to stop: strand
Words that match: [strand]
string to search [#] to stop: foon
NO words match!
string to search [#] to stop: #
Submitting Your Work
When you have completed the assignment and tested your code thoroughly,
create a .zip file with your work (including AnagramTree.java and
BST.java. Name the zip file "your-andrew-id".zip and email it to me
mjs @ cmu.edu.
Make sure to keep a copy of your work just in case!