How Java Works

This page is optional reading.

Some students wish to know a little more about the Java programming language and this page is the simplest explanation that I can formulate. The textbook doesn't provides any similar explanation.

If you initially find the terminology foreboding, you can stop reading.  Read the page again later in the term.


To understand how Java works, you will first have to learn some of the terminology associated with earlier programming languages.

A Brief History of How Programming Languages Work

Compilers
Most computer languages follow the "compile-link-execute" sequence as I describe in the software development cycle. In all cases, you start with a user-readable program written in a high-level (also called third-generation) language.  The compiler for the language converts the high-level language into a low-level language (something that the computer can read more easily).

In most languages (like BASIC, C/C++, and Pascal), the result of compiling a user-readable program (called source code) creates an object  file. Linking connects the object files that you have created to other object files to form an executable which is the low-level  binary file that the computer can understand (i.e., the operating system can load the executable into RAM and run it); another term for an executable is "(relocatable) machine code".

By itself, compiling (without linking) generally cannot create an executable because most programs rely on other pre-compiled programs provided by the language  For example, if your program takes a square root of a number, your program will rely on the mathematical program (provided by the math library of the language) that actually determines how to compute a square root. When you start writing bigger programs, you will likely link to other programs that you, yourself, have written.

The job of the linker is to associate the program you've compiled with other programs that have already been compiled.  Object code isn't easily read by humans but it cannot be correctly understood by a computer.  The linker takes your object code, along with any other object code files (that your program requires) and links them together to create an executable.  You shouldn't expect to find link errors until you're writing larger programs that have multiple parts; link errors  occur when the object files for your program don't completely agree.

 Interpreters
There are a few other languages (such as Lisp or Scheme) that avoid the "compile-link-execute" sequence and instead try to do the job "on-the-fly" or "as needed".  In other words, the language takes over the task of loading each high-level statement into RAM, compiling it, linking that result (if need be), and then executing the result (before the next high-level statement is even looked at). We call such languages interpreted languages (as opposed to compiled languages).  As an analogy to foreign languages, a compiler acts as a translator (say, someone who translates a book) and an interpreter acts like, well, an interpreter.

When debugging programs, there isn't much difference between compilers and interpreters because the executable file needs to be regenerated whenever the source code changes.  However, once debugging is completed, an executable created by the compiler will run much faster than a similar piece of source code that always has to run through its interpreter (using the analogy, reading a translation of a poem will always be "faster" than having to interpret the poem on the fly every time you read it).

There are some advantages to interpreted languages. In artificial intelligence, interpreted languages are prefered since programs may have to adapt to new stimuli. Also, it is generally easier to build a prototype program using an interpreter. Many interpreted languages also provide a "compile mode" to create executables which will run about as fast as an executable created by a compiler.


Java is the first substantial language which is neither truly interpreted nor compiled. Generally, both techniques are used in combination.

In other languages, the executable code generated is only be executable on the target machine for which it was compiled. For example, if you compile a C++ program on a Windows machine, the executable will run on Windows machines but not on a Mac or a Linux machine. Getting a program to work on multiple platforms (that is, as determined by the target machine & its operating system) was generally difficult work.  Compilers had to be individualized for the platform; a Mac compiler generated a particular type of object file.  The only feasible means to get your program to work on multiple platforms was to create multiple versions of your source code for each platform.  Further, a given platform may have more than one possible compiler that it can use and you may not get exactly the same behavior from the same source code on different compilers (even if the platform is identical).

Java has succeeded in eliminating the platform issue because it has reorganized the compile-link-execute sequence at an underlying level of the compiler (which would only make sense to people who've taken a course on compiler design). Essentially, the Java designers isolated the parts of a program which are dependent on the platform.  When you compile a Java program, you don't create an object file, but rather a bytecode file which is like a an object file for a virtual machine.  The same bytecode file can be used on any platform (i.e., it is platform-independent).  In fact, the Java compiler is often called the Java VM (for virtual machine). Bytecode is also the innovation that allows applets to exist -- bytecode can be transfered over the Internet and then interpreted at the downloader's machine.

Thus, compiling a Java program doesn't create an object file but a bytecode file instead. When you try to execute a bytecode file, all of the platform-specific issues can no longer be put off.  Thus, when you execute Java bytecode, you actually start an interpreter which trys to interpret each bytecode statement, and, when platform-specific operations are required, the interpreter links in appropriate code for the specific platform. The collection of language-defined object code that used to be associated with linking is called the Java API (Application Programming Interface) and is essentially language-defined bytecode.

In a nutshell, the compile-link-execute cycle for earlier languages would be more closely defined as "compile-link then execute". In Java, the sequence is closer to "compile then link-execute" and  the "just-in-time" computing is used to explain the Java interpreter. Like other interpreted languages, it is possible to get Java programs to run faster by converting bytecode into executables; the disadvantage is that such executables will only work on the platform in which it is created.

Microsoft's new language, C#, is the first major language after Java to have a compile then link-execute cycle. However, many of C#'s "innovations" are modeled after Java innovations or else were incorporations of other standard IDE capabilities into a programming language. Of course, Microsoft is not interested in true platform independence. Also, not all of Microsoft's API is free (some packages must be purchased).