Benchmarking the System
There are three performance characteristics of Java that we wish to understand: the effectiveness of just-in-time compilation, the cost of portability, and the overall system speed. We therefore benchmarked four different implementations of VCODE:
- JDK: The Java interpreter from Sun's Java Development Kit (JDK), using the VcodeEmulation class described in Section 3.
- JIT: A just-in-time (JIT) Java compiler, also using the VcodeEmulation class. The just-in-time compiler compiles Java bytecodes into machine code as it interprets them for the first time, and then stores the machine code for future reuse.
- Native: The JDK interpreter, using the VcodeNative class. This class is a replacement for the VcodeEmulation class, and uses native C functions similar to those in CVL to implement the vector methods. However, it still uses Java for the vector stack code and for array allocation, because we want these objects to be accessible to Java's garbage collector.
- Vinterp: The existing VCODE interpreter vinterp, written in C and linked against a serial version of the CVL library. This combination has been tuned for asymptotic performance on large vectors, with hand-unrolled loops and a memory management mechanism designed specifically for VCODE.
Comparing the performance of the JDK interpreter and JIT compiler implementations helps us understand the effectiveness of JIT compiler technology. Comparing the JDK and JIT implementations with the native-methods implementation, which uses compiled C code, lets us study the price of portability. Finally, benchmarking the existing VCODE interpreter enables full end-to-end performance comparisons.
We used three different NESL benchmarks to compare the different implementations of VCODE:
- Least-squares line-fit. This function finds the best fit to a sequence of points. It is simple straight-line code with no conditionals or loops.
- Selection (generalized median-finding). This function uses a recursive randomized algorithm to find the element in a vector that would be at a specified position if the vector were sorted.
- Sparse matrix-vector multiplication. This function multiplies a sparse matrix stored in compressed row format by a dense vector, using a nested data-parallel algorithm.
We give the source code and test data for the benchmarks in Appendix A. Timings for supercomputer platforms have previously been reported [7] [11]. All three benchmarks have asymptotic running times that are linear in the size of the problem.