This page is optional reading. If you initially find the terminology foreboding, you can stop reading. Read the page again later in the term.Some students wish to know a little more about the Java programming language and this page is the simplest explanation that I can formulate. The textbook doesn't provides any similar explanation.
Before understanding how Java works, you will first have to learn some of the terminology associated with earlier programming languages.
A Brief History of How Programming Languages Work
All high-level (also called third-generation) programming languages allow you to write programs in a language similar (although much simpler) than natural language. The high-level program is called the source code.A low-level programming language is something closer to what makes sense to a computer. Details for low-level languages are unimportant in the intro CS courses.
Compilers
As an analogy to foreign languages, a compiler acts as a translator (say, someone who translates a book) and an interpreter acts like, well, an interpreter.Most computer languages use the "compile-link-execute" format. You start with source codeInterpreters
and the compiler converts this program into a low-level program.In most compiled languages, the file containing the resulting low-level code is called an object file. A collection of object files are linked together to create an executable file (i.e., the operating system can load the executable into RAM to run the program). Another term for an executable is "(relocatable) machine code".
An object file isn't easily read by a human but it may not be runnable on a computer. For example, if your program takes a square root of a number, your program will rely on the mathematical program (provided by the math library of the language) that actually determines how to compute a square root. The object file for the program will refer to the square root but will not have the code explaining how the square root computation works. Similarly, when you start solving bigger problems, you will likely divide your project into multiple programs that communicate.
The process of linking connects the object files that you have created along with other pre-existing object files to form an executable file. The linker does this job. You shouldn't expect to find link errors until you're writing larger programs that have multiple parts; link errors occur when the object files for your program don't completely communicate appropriately.
There are a smaller number of languages (Lisp and Scheme are most famous; CMU uses ML in 15-212) that avoid the "compile-link-execute" sequence and instead try to do the conversion "on-the-fly" (also called "as needed").In other words, an interpreted language takes each high-level statement, determines its low-level version and executes (while linking if need be) the result. This is done for each statement in succession (before the next high-level statement is even looked at).
While debugging programs, you wouldn't notice much of a difference between compilers and interpreters because the executable file needs to be regenerated whenever the source code changes. However, once debugging is completed, an executable created by a compiler will run much faster than a similar piece of source code that always has to run through its interpreter. Using the analogy, reading a translation of a poem will always be "faster" than having to interpret the poem on the fly every time you read it.
However, there are advantages to interpreted languages. In artificial intelligence, interpreted languages are prefered since programs may have to adapt to new stimuli. Also, it is generally easier to build a prototype program using an interpreter. Many interpreted languages also provide a "compile mode" to create executables which will run about as fast as an executable created by a compiler.
How Java Works
Java is the first substantial language which is neither truly interpreted nor compiled; instead, a combination of the two forms is used. This method has advantages which were not present in earlier languages.
Platform-Independence
To understand the primary advantage of Java, you'll have to learn about platforms. In most programming languages, a compiler (or interpreter) generates code that can execute on a specific target machine. For example, if you compile a C++ program on a Windows machine, the executable file can be copied to any other machine but it will only run on other Windows machines but never another machine (e.g., a Mac or a Linux machine). A platform is determined by the target machine (along with its operating system). For earlier languages, language designers needed to create a specialized version of the compiler (or interpreter) for every platform. If you wrote a program that you wanted to make available on multiple platforms, you, as the programmer, would have to do quite a bit of additional work. You would have to create multiple versions of your source code for each platform.
Java succeeded in eliminating the platform issue for high-level programmers (such as you) because it has reorganized the compile-link-execute sequence at an underlying level of the compiler. Details are complicated but, essentially, the designers of the Java language isolated those programming issues which are dependent on the platform and developed low-level means to abstractly refer to these issues. Consequently, the Java compiler doesn't create an object file, but instead it creates a bytecode file which is, essentially, an object file for a virtual machine. In fact, the Java compiler is often called the JVM compiler (for Java Virtual Machine).
Consequently, you can write a Java program (on any platform) and use the JVM compiler (called javac) to generate a bytecode file (bytecode files use the extension .class). This bytecode file can be used on any platform (that has installed Java). However, bytecode is not an executable file. To execute a bytecode file, you actually need to invoke a Java interpreter (called java). Every platform has its own Java interpreter which will automatically address the platform-specific issues that can no longer be put off. When platform-specific operations are required by the bytecode, the Java interpreter links in appropriate code specific to the platform.
To summarize how Java works (to achieve platform independence), think about the compile-link-execute cycle. In earlier programming languages, the cycle is more closely defined as "compile-link then execute". In Java, the cycle is closer to "compile then link-execute".As with interpreted languages, it is possible to get Java programs to run faster by compiling the bytecode into an executable; the disadvantage is that such executables will only work on the platform in which it is created.
Other Advantages of Java
Most of the other features of Java had previously existed in various other programming languages (but never all at once). Most explanations of these advantages (e.g., distributed programming, multi-threading, security) are well beyond the scope of this course. However, there are two features that I will briefly address.
Another feature that was introduced with the Java language is the ability to write special Java programs (called applets) that are designed to run on the World Wide Web. You could write a Java applet and put the bytecode on a web page; if anyone with a Java-enabled web browser goes to your web page, that applet bytecode will be downloaded to the browsing computer and executed within the web browser. Of course, this would not be possible without platform independence. This feature had a major effect on the rapid proliferation of Java and many instructors (including myself) taught programming using applet-based programs. However, for a variety of reasons, I have decided to postpone a discussion of applets (and more generally, graphic user interfaces) to the end of the course.
Java is also one of the first languages to be "library-based" in that the designers of the language have included a large number of pre-existing programs. A programmer can connect their program to these general purpose programs as needed. It frees up the programmer's time since s/he doesn't have to write as much code. We'll start exploring this library called the API (application programmer interface) after we learn a substantive amount of the Java language itself.
Java vs. C#
Microsoft's recent language, C# (a variation of C/C++),
was the first major language after Java to have a compile then link-execute
cycle. C# has also addressed most of the other issues that were Java
advantages and C# has introduced other advantages as well. However, Microsoft
does not seem to be interested in true platform independence (for example,
there still is no C# bytecode interpreter for Linux). Also, sections of
the C# software libraries require additional purchases beyond the basic
cost of the language.