Collection Classes
Reading/Using/Writing Generic Classes
and Special Iterators

Advanced Programming/Practicum
15-200


Introduction We have now covered collection classes as they exist in Java 1.4. In this lecture we will cover generic collection classes, which arrived with Java 1.5. At a basic level, generics allow us to parameterize the type of a collection class. For example, with generics we can declare that a List x must contain only String values by writing List<String> x;.

Then, the Java compiler will ensure that all values added to the list x are Strings; it will reject the command x.add(new Integer(0)); which is perfectably acceptable in Java 1.4. It will also automatically cast any value gotten from this list to be a String as in String s = x.get(0);, which in Java 1.4 must be written as String s = (String)x.get(0); and in the case of Java 1.5, the implicit cast cannot throw an exception when the program runs: it is guaranteed to cast correctly.

To achieve this extra compile-time checking, Java 1.5 includes a complicated mechanism for writing and using generic interfaces, classes, and methods. In its full generality, this mechanism is very powerful and sometimes difficult to understand and use. We will focus on understanding this mechanism enough to effective use the generic collection classes that already exist in the Java library, which is a much simpler course of study. We will interleave the study of both below, but eventually stop short of explaining all the possibilities. If you are interested in a fuller treatment of Java generics, you can read the paper Java Generics which is a pdf tutorial on the subject. Much of the discussion in this lecture follows this material. Soon there will be an entire book published on this topic Java Generics and Collections


Simple Generic Collections The real List interface in Java 1.5 is defined as follows. I've elided some methods because they are unimportant or use more complex generic features: I'll discuss these latter methods later in this lecture.
public interface List<E> extends Collection<E> {
  ...
  boolean     add(E o);
  E           get(int index);
  ...
  boolean     contains(Object o);
  Iterator<E> iterator();
  ...
}
In this interface, we specify a type parameter E in angle brackets. Notice that the add method uses E to specify the type of object can be added to this List. The get method uses E to specify the type of values that can be gotten from the list. The contains method still uses Object, so you can ask if any object is in the list. The iterator method returns an object that implements an iterator that returns values (via next) of type E; more on generic iterators soon.

Likewise, the ArrayList class specifies

public class ArrayList<E> extends AbstractList<E> implements List<E>

So, together we can declare List<String> x = new ArrayList<String>(); Here we use String to instantiate the generic type parameter E in the List interface and ArrayList class. Based on the declarations in the interface, calling add requires a String argument (or the compiler reports an error) and calling get returns a String value (requiring no special casting by the programmer, and no possible throwing of an exception by the runtime system).

Note that writing List<Object> x = new ArrayList<Object>(); in Java 1.5 is the same as writing List x = new ArrayList(); in Java 1.4. With these declarations,, no objects are disallowed in add and no automatic casting is performed via get In fact, you can write List x = new ArrayList(); in Java 1.5, although Eclipse will report a warning. Using the previous method of declaration, there will be no warnings. We will discuss these warnings later in this lecture.

In the pursuit of effectively using generic collection classes (which is our main pursuit), we now know almost everything that we need to know. Let's explore some more complicated uses of the same principles.

The generic Map interface (and map implementations) are specified with two parameterized types, as in Map<K,V>. K is the type of each key; V is the type for a key's associated value. So, to store a map specifying how often each word appears in a text file, we would declare Map<String,Integer>. If we were wanted a map specifying the lines on which each word appears in a text file, we would declare Map<String,List<Integer>>. Note that, we can compose (or nest) instantiations of generic types.

Unfortunately, these generics do not work nicely in one case that we have seen. Recall in the Thesaurus example, we wrote the data structure as Map[String] -> List[String,List[String*]]. The problem here is that we use the outer List as a tuple: always storing two values of different types (String and List<String>) so at best we can write this as Map<String,List<Object>>.

What is needed is a generic Pair<F,S> with getFirst and getSecond methods. This type really is equivalent to Map.Entry, but the name isn't suggestive; it is more map-related. While a Triple and Quadruple type might make things easier (via getFirst, getSecond, getThird, etc.) we can always just nest multiple Pairs to achieve the same effect. For example, returning to our Thesaurus example, if we wanted to specify Map[String] -> List[Integer,String,List[String*]] where the Integer specifies the number of syllables in a word (for composiing poetry), we could write it using a Pair of a Pair: Map<String,Pair<Integer,Pair<String,List<String>>>> which is quite a mouthful. Assuming thesaurus is of this type, to get the antonym of (the String associated with) word, we'd write thesaurus.get(word).getSecond().getFirst() -another mouthful.


Iterators and Iterable: More Generic Classes and a Special "for" loop Let's briefly examine how generic iterators work. The Iterator interface in Java 1.5 is
public interface Iterator<E> {
  boolean hasNext();
  E       next();
  void    remove();
}
So, E specifies the type of value next returns.

We can see from the prototype of the iterator method in the List interface above that it returns a generic iterator, in this case specialized to an iterator that returns String values (and the Java compiler knows that). If we declare List<String> x; and want to catenate all the String values in that list into one big String, we can write

String answer = "";
for (Iterator<String> i = x.iterator(); i.hasNext(); /*see body*/)
  answer += i.next();  //Note, no cast is necessary 
Because x is declared to be of type List<String>, Java knows its iterator method returns an iterator of type Iterator<String>, which means that Java knows that the next method for this iterator returns a value of type String.

Java 1.5 allows an even shorter way to write this loop, based on the Iterable interface. This interface, although generic, is trivial: basically, a class implements the Iteratable interface if it implements an iterator method (meaning the object can return an iterator for the values it stores).

public interface Iterable<T> {
  Iterator<T> iterator();
}
The List interface extends the Collection interface which extends the Iterable interface (yuch). So any class that implements a list (or any collection for that matter) define an iterator method. So, List<String> implements Iterable<String> Therefore, we can write
String answer = "";
for (String s : x)
  answer += s;
}
to catenate all the String values together. In the general case for for (T v : x) to be understandable, x must implement Iterable<T> which means its iterator returns a value of type T. Generally, Java translates the code
for (T v : x)
  body
}
into
for (Iterator<T> secret = x.iterator(); secret.hasNext(); /*see body*/) {
  final v = secret.next();
  body
}
Note that you cannot use this kind of loop to remove values from a collection, because you don't have access to the iterator secret (it really is a secret variable that the compiler uses but you cannot). To use remove, you'd have to write a for loop explicitly declaring and using the iterator.

Subtyping We already know that if class B extends class A then B is a subclass of A. We can easy extend this concept to interfaces, and generally talk about one type (interface or class) being a subtype of another. The key property of subtypes is that if B is a subtype of A, then we can assign a B object to an A variable (a B can do everything A can do, so it can do whatever the compiler allows).

Let's look at how subtyping interacts with collection classes. Assume that we write List<String> x = new ArrayList<String>(); Can we write List<Object> o = x;? We CANNOT. The salient point is that although String is a subtype of Object a List<String> is NOT a subtype of a List<Object>. Let's explore why it cannot be a subtype. Certainly calling o.get(0) returns something that is an Object. But now think about calling o.add(new Timer()); first. A Timer is an Object, so the compiler should allow it; but we have now put a Timer object in a list that was supposed to store only String values. In fact, if we wrote x.get(0) we (and Java) would be expecting a String to be returned, but if this were allowed, a Timer would be returned. For this reason, the assignment List<Object> o = x; is disallowed and reported as an error by the compiler.

When we think about subtyping, we must remember that a list can be changed. If lists were immutable, then this form of subtyping would work fine.

Again, although String is a subtype of Object, List<String> is NOT a subtype of List<Object> Therefore the assignment List<Object> o = x; is disallowed. Since passing an argument to a parameter is a form of assignment, it means that we cannot pass an argument whose type is List<Foo> to a parameter whose type is List<Object>. This affects how we must generically parameterize the methods that we write in our code.

All this, while logical, might be a bit surprising because array subtyping in Java is different. We actually can declare String[] s = new String[10]; and then declare Object[] o = s; Now, writing o[0] = new Integer(1); will compile in Java, but at runtime it will throw an ArrayStoreException. So, for arrays, the "subtype" assignment shown here succeeds, but for storing a value in an array, Java performs an extra check at runtime: a checkthat the type of the object it is storing (which in this case is Integer) matches the type of the array object (which in this case is String), throwing an exception when it doesn't (as happens in this case). Thus, unlike generic collecton classes, arrays are not type safe at compile time (but they are type safe at run time).

This method work with arrays because we (and the Java runtime) can ask an array object what type of array it is. With the way Java generics are implemented (see "erasure" in the last section), this is not possible, so the compiler prohibits the first assignment, Object[] o = s;


Wildcards for Generic Methods Besides writing generic collection classes, we can also use generics to write individual methods (static and non-static). Here is where the subtyping problem discussed above causes a problem, requiring another generic mechanism in Java, to solve the problem. As a first example, suppose that we wanted to write a generic method that prints every value in any collection. In Java 1.4 we would write this as
public void print (Collection c) {
  for (Iterator i = c.iterator(); i.hasNext(); /*see body*/)
    System.out.println(i.next());
}
How can we express this using Java generic collection classes? A first attempt (also using the special for loop that we have learned) might be.
public void print (Collection<Object> c) {
  for (Object o : c)
    System.out.println(o);
}
The problem is that this parameter works only for an argument that is a collection of Object. As we saw above, a Collection<String> argument, for example, is not a subtype of Collection<Object>, so the Java compiler would reject such an argument if we tried to pass it to this method. So, we cannot use Collection<Object> as a supertype for all collections? So, what can we use?

To specify such a collection, Java added a special feature called a wildcard type parameter: ?. We can write Collection<?> for a parameter type, which matches arguments constructred from any collection. Thus, we can write this code as

public void print (Collection<?> c) {
  for (Object o : c)
    System.out.println(o);
}
Java knows that if we iterate over such a collection, the values can all be considered references to Object, because we have repeatedly seen that all references are subtypes of Object references. But, inside such a method, we cannot write c.add(..anything..) because the compiler cannot know what kinds of values are in such a collection, and thus not know what kinds of values can be safely added to the collection.

Now let's look at two variants of the wildcard parameter type. Recall our example of the Shape (abstract) class, which has concrete subclasses Circle and Rectangle; recall too that the getArea method works on any object constructed from a class descended from Shape.

Now, suppose that we wanted to write a method that takes some collection (List, Set, whatever) of Shapes as a parameter, and returns the sum of all the areas. Note, the actual collection might be Collection<Shape> or Collection<Circle> (a collection containing only the Circle shape), or Collection<Rectangle> (a collection containing only the Rectangle shape). Certainly this method should work on any of these three collection types. A reasonable attempt to write this method might be

public double areaOfCollection (Collection<Shape> c) {
  double sum = 0.0;
  for (Shape s : c)
    sum += s.getArea();
}
While this method is correct for an argument of Collection<Shape>, it does not work for an argument of Collection<Circle> or Collection<Rectangle>, because these two collections are not a subtype of Collection<Shape>. Remember, even though Circle is a subtype of Shape, Collection<Circle> is not a subtype of Collection<Shape>, so the Java compiler would reject this method call. Obviously, the "subtype of a collection" problem is really important, because it invalidates many "obvious" solutions to problems.

Generalizing to Collection<?> seems like overkill, and causes problems because there is no guarantee that the collection contains objects descended from the Shape class on which we can call getArea. To solve this problem, Java allows us to generalize wildcards type parameters by specifying a constraint on them. These are called bounded wildcards: an example is Collection<? extends Shape>, which we can use to write our areaOfCollection problem. A parameter of this type can match Collection<Shape>, Collection<Circle>, and Collection<Rectangle>. So, we can write this method correctly as.

public double areaOfCollection (Collection<? extends Shape> c) {
  double sum = 0.0;
  for (Shape s : c)
    sum += s.getArea();
}
Here, for iteration purposes, Java knows that all values in the collection are a subtype of Shape. Note that we still could NOT add anthing to such a parameter inside the method: writing c.add(new Circle(...)) would be a mistake if the argument collection was of type Collection<Rectangle> so the Java compiler would report any add as an error.

Another example of bounded wildcards occurs in the header of the addAll method in the Collection interface. This interface (like the List interface that extends it, shown above), has a type parameter of E. The addAll method is defined by

boolean addAll(Collection<? extends E> c);
which specifies that if we are to add to this collection every value from collection c, then every value in collection c must be a subtype of E More concretely, if we have a Collection<Shape>, then we can add to it collections with the type Collection<Shape>, Collection<Circle>, and Collection<Rectangle>. If we have a Collection<Circle>, then we can add to it only values from Collection<Circle> (or a subclass of Circle, if it has any).

Contrast this with the header for the contains all method

boolean containsAll(Collection<?> c);
which allows any kind of collection. The equals method it uses will discard the "wrong" type of values, but we don't put any syntactic restriction on this parameter.

Finally, the Comparator interface is also generic, and is defined in Java 1.5 as

public interface Comparator<T> {
  int compare(T o1, T o2);
  boolean equals(Object o);
}
Thus, we can write comparators that are specific to the classes we are using, and not have to use casting inside them. If we declare List<String> x = new ArrayList<String>(); and want to sort it by length, we can call Collections.sort(x, new CompareByLength()) where we define
public class CompareByLength implements Comparator<String> {
  int compare(String o1, String o2)
  {return o1.length() - o2.length();
  //Inherit boolean equals(Object o); from Object class; don't use it!
}
Note the String parameters and no casting. One last hack. The designers of generfic Java wanted to allow previously written Comparators to still be used with the new generic library. For example, we still want to be able to use
public class CompareByLength implements Comparator {
  int compare(Object o1, Object o2)
  {return ((String)o1).length() - ((Stringo2).length();
  //Inherit boolean equals(Object o); from Object class; don't use it!
}
with Collections.sort. This is equivalent to using Object as the type parameter for Comparator When specifying Collections.sort we can use the type parameter Comparator<? super T> to indicate that for sorting values of type T we can also use a Comparator that operators on Object (which will of course do casting, which might throw exceptions, if we make a mistake). The actual definition of Collections.sort is
public static <T> void sort(List<T> list, Comparator<? super T> c)
This uses a <T> before the parameter list to establish a type for which a special relationship holds among the parameters: to sort a list of T we must use a Comparator whose compare method expects parameters of some supertype of T (like Object). Beyond here there be dragons. We will not pursue this issue further, nor talk about other issues relating to generics; see the references in the Introduction to this lecture note for more details.

Final Words This has been a whirlwind tour through the most important parts of Java generics. Here, "most important" means "most important for programmers who use generics". Our goal is to understand enough about generics to use generics written by other programs. Fundamentally, this means that we need to know enough about generics to get the compiler to accept our program with no warnings or errors. That is, if Java accepts our program, its type safety should be ensured by the generic mechanism.

Generics in Java 1.5 were designed so that if you use UNgeneric collections (as was the case before Java 1.5) the Java compiler will indicate that your program has no syntactic errors, but it will have warnings. You code can still work, according to the old Java rules. But, if you fully use generic collections correctly, there will be no warnings. So, that is your goal. Recall that if your code has any syntactic errors, it will not run.

So, our goal is NOT to be able to write new and interesting generic interfaces, classes, or methods. Although, we should be able to write generic classes similar to others that we have seen. For example, we should be able to create new implementations of generic collection classes (like a priority queue using a heap), which we will do later in the course.

Finally, a true understading of Java generics requires familiarity with the concept of erasure, by which the oompiler remembers type information but "erases" it from the Java classes in our code, until it reaches typeless classes (the ones used in 1.4). Thus, the compiler checks and reports any type errors. If there are none, it is "safe" to use even the UNgeneric classes, just as they appear in Java 1.4. Again, see the references for more details on erasure.

Finally, the next programming assignment will ensure you know how to use UNgeneric and generic collection classes.


Problem Set To ensure that you understand all the material in this lecture, please solve the the announced problems after you read the lecture.

If you get stumped on any problem, go back and read the relevant part of the lecture. If you still have questions, please get help from the Instructor, a CA, or any other student.

The programming assignment will throroughly test your ability to use all the collecton classes.

  1. None yet. You'll do plenty of programming with generic classes.