Java 911, MIND, September 1998

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

java911@microsoft.com

Jonathan Locke
So long, and thanks for all the bytecodes

his month will be the last installment of the Java 911 column, so I'd like to step back for a moment and review the Java language itself, particularly in contrast with C/C++. Good-bye to all my faithful readers out there—so long, and thanks for all the questions!
The basic syntax of Java code has a lot in common with C++. In fact, the code looks pretty similar from a distance (especially if you squint). Keywords, variable declarations, statements, loop constructs, and conditionals all look about the same in Java as they do in C++. But don't let that fool you! Java is a completely different beast in many ways. Let's take a closer look at some of those differences.

Classes and Objects
One of the first things that I noticed about Java was the requirement that all code be declared within a class declaration. In C++, this is certainly not required. In fact, the entire functional C language subset of C++ can be utilized without ever declaring a single class. For example, in C++ you might write hello world like this:


 HelloWorld.cpp
 
 #include <stdio.h>
 
 void main (int argc, char** argv)
 {
     printf("Hello, World!");
 }

Notice that the C++ function main is declared at the global lexical scope. But in Java, hello world it, looks like this:


  // HelloWorld.java
 
 /**
  * Hello world class
  * @author Jonathan Locke
  */
 public class HelloWorld
 {    
     /**     
      * Main application entrypoint.     
      * @param arg Command line arguments
      */
     static public void main(String[] arg)
     {
         System.out.println("Hello, World!");
     }
 }

The main program entry point is implemented at the class scope (inside class HelloWorld, in this case) as a static method. While subtle, this distinction is important, and stems from the fact that Java is an almost purely object-oriented language. In fact, the only entities in Java that are not objects (or arrays of objects) are the primitive types (Boolean, byte, char, short, int, long, float, and double). Everything else in Java is a "first-class" Object of some kind—an instance of some class that inherits directly or indirectly from java. lang.Object.
      The more structured nature of Java is also visible in the System.out.println statement in HelloWorld. In C++, you can just call a function declared at global scope without a class name. This is not possible in Java. The method call to println must be made using the output stream object System.out. (System is short for java.lang.System, and out is a public static variable declared by java.lang.System.)
      HelloWorld is an especially useful example because you also run into one of the biggest differences between C++ and Java: Java has no pointers. Notice the declaration of main in the C++ version. An integer and a pointer to a pointer to a character are passed in. The integer gives the number of null terminated C/C++ strings being passed in and the character pointer points to the first null-terminated string in the list. In C++, there are no safety guarantees about argv. If you pass argv[10] to printf and there aren't 11 elements in the array (0 to 10), the results are undefined (and may in fact crash your application).
      In Java, things are very different. HelloWorld's main method takes an array of (first-class) String objects. The array's length is accessible as a public member variable of the array object itself (in this case, arg.length), so there is no need to pass in an argc parameter. And the arg array itself is bounds-checked at runtime. If you access arg[10] in a five-element array, the results are defined in Java—an ArrayIndexOutOfBounds exception will be thrown. When developing and debugging applications, the difference between the very loose pointer restrictions in C++ (basically none at all) and Java's well-defined array and reference behavior makes Java a far easier and more effective choice.
      The other important thing to know about references in Java is that they are not pointers (in spite of the very confusingly named NullPointerException, which is thrown when you attempt to access something through a null Java reference). References are strongly typed in Java and type checking of casting operations is performed at runtime (when necessary). C++ is completely the opposite. As a programmer, you can cast anything to be anything else at any time, no matter how dangerous the outcome might be. You can make an int into an object pointer or a pointer to a character into a double. C++ just doesn't care what awful unintended thing might happen. Java, on the other hand, does not allow casts between primitive types and Objects. And Object casts must be valid at runtime. If you attempt to cast a java.lang.Vector to a java.lang.String, it is a 100 percent sure thing that a ClassCastException will be thrown.
      And so, the only cast you can perform on any object in Java is, of course, (Object)x, since all objects inherit from java.lang.Object. Naturally, casts which are guaranteed to always fail, such as


 String s = (String)System.out;

will be flagged by the compiler with an error like this:

 
    Microsoft® Visual J++ Compiler Version 1.02.7315
    Copyright (C) Microsoft Corp 1996-1997. All rights reserved.
 
    CastFail.java(18,38) : error J0067: Cannot convert 'PrintStream' to 'String'

Interfaces
Another important feature in Java is the interface. As a design decision, Java does not include multiple inheritance like C++. Rather, Java uses interfaces that are very similar to the COM concept. An interface in Java is essentially a special kind of class that has no implemented methods. Instead, it specifies a collection of entry points on an object. For example, you might define an interface called Runnable, which declares a method called run(). Any object that then implements the Runnable interface can be cast to a Runnable, allowing you to call the object's run method "through the interface."
Although Java does not support multiple inheritance of implementation, it does support multiple inheritance of interfaces. This means that Object A can be both Runnable and Observable at the same time, and you can cast the same object to either type. This very neatly accomplishes the polymorphism that you get from C++ multiple inheritance without putting any special constraints on implementation. Favoring object composition (aggregation) over inheritance is good, standard object-oriented methodology, and is the reason that COM made the very same choice about interfaces.

Exceptions
Another big win for Java over C++ is the design of Java's exception handling. In Java, there are two types of exceptions: checked (those exception classes derived from java.lang. Exception) and unchecked (those derived from java.lang. Error). Checked exceptions impose a special restriction on programmers—if a method is declared as throwing a checked exception, the caller must either handle the exception with a try/catch block or must itself declare that the exception may be thrown. This allows the compiler to ensure that all checked exceptions are handled in some fashion at compile time. Such compiler assistance is a big advantage over C++, which provides no such checking. In C++, it is necessary for you to know what exceptions are thrown from what pieces of code in order to ensure that they get caught.
With some C++ compilers it is possible to work with the operating system to resume or retry certain exceptions. Microsoft's Win32 Structured Exception Handling (SEH) mechanism is exposed through their compiler in just this way. If such a feature were available in Java (and I'm not necessarily saying that it should be), you could create dynamically growable arrays by catching and resuming an ArrayIndexOutOfBounds exception after resizing the array.

Threading
      Built-in support for threading and synchronization primitives in Java is another major difference between the two languages. In C++, threading is considered the operating system's territory and no attempt is made to make threading portable. The downside of this is that your threading code won't port. The upside is that your operating system will very likely provide a wider array of more efficient synchronization APIs.
      Although guarantees about the precise behavior of threads from platform to platform are very weak in Java, the basic notion of a thread is part of the language. Threads are created by instantiating a java.lang.Thread object, then passing in a target interface (java.lang.Runnable) that specifies the code that the new thread should run. When the thread's start method is later called, a new thread is created and enters the target object's run method via the Runnable interface.
      As various threads run through the system, the Java synchronized keyword can be used to ensure that only one thread can have access to an Object at a given time. It turns out that this simple serialization primitive is also sufficient to construct higher-level synchronization constructs like semaphores and reader/writer guards. Unfortunately, because Java synchronization is essentially an efficient form of polling, higher-level constructs aren't always the most efficient.
      Besides efficiency, one other big downside to threading in Java (and I'm serious about this) is that it is so easy to do. Programmers with little or no background in multithreading can create threads in no time flat. In fact, you can create and start a thread with one line:


 class ThreadMe implements Runnable    
 {
     public ThreadMe()
     {
         new Thread(this).start();
     }
 
     public void run()
     {
     }
 }

Such simplicity can lead to situations where inexperienced people are creating (and probably mismanaging) too many threads. And even those of us with lots of multithreading experience have trouble deciding how much synchronization to apply and where to put it. If you use too little synchronization, you can end up with race conditions; use too much synchronization and you can end up with deadlocks. While it's certainly not a bad idea to make threading easy, it does have a lot of potential to make trouble if it's misused.

Memory Management
      Because of the tight restrictions on pointers in Java (which don't exist in C++), it is possible for the Java Virtual Machine to reclaim unreferenced storage (memory) automatically. This is wonderful because programmers (especially beginners) no longer have to spend so much time trying to find memory leaks in applications. Unfortunately, there are ways of defeating the garbage collector. One problem that can crop up is that the garbage collector falls behind on its duties if a program is creating a lot of garbage. In practice, this often turns out to be fatal for the application. Another, more obvious problem is that objects cannot be garbage collected until there are no references to them. This sometimes makes it necessary to explicitly null out object references to allow the objects to be collected. While this isn't as difficult as making proper use of stack objects and the C++ delete keyword, it does require that the programmer think a little about object usage at times.
      When an object has no references and is about to be collected, the garbage collector makes a call to the object's finalizer method. This method can be used to reclaim resources associated with the soon-to-be-reclaimed Java object. Unfortunately, there aren't any guarantees about what thread will call this method. Therefore, be very careful about the methods you call in a finalizer, as you can easily wind up in a deadlock situation.
      C++ programmers may be tempted to see finalizer methods as equivalent to C++ destructors, but nothing could be further from the truth. C++ destructors are used to reclaim resources (particularly memory) used by the object being destructed. But in Java, those same resources will eventually be reclaimed by the garbage collector. The only exceptions to this (and the reason that finalizers are part of Java to begin with) are native resources that are entirely outside the realm of the garbage collector. So, generally speaking, only classes with native methods are ever going to need finalizers. And if you are writing in pure Java, chances are that you'll never need to use them.

Templates
C++ has the notion of generic templates, which are, roughly speaking, type-parameterized classes (or functions). Also, C++ has a useful standard library of these templates called the Standard Template Library (STL). Although there has been talk of adding a similar feature to Java, it is not available at the time of this writing. One downside to the nonexistence of parameterized types in Java is that it isn't possible (or at least, isn't convenient) to create type-safe container objects. The Vector class in java.util is a perfect example of this problem. The Vector.addElement method takes a java.lang.Object, and Vector.elementAt returns a java. lang.Object. This means that it is possible to put the wrong type of object in an array and also to miscast the Object when you pull it out again. Both are trouble waiting to happen.

Other Issues
      Java is certainly less efficient than C++ for a variety of reasons. First and foremost, C++ compiles straight to native code while Java compiles to bytecode, which is later interpreted on the target machine. Interpreting bytecode takes time. While it is theoretically possible to take advantage of bytecode to perform aggressive runtime optimizations in the interpreter, there are many applications for which Java will probably always be slower than C++.
      Besides the native code versus bytecode issue, C++ has certain speed advantages over Java because of its static nature. Because it is much less dynamic, C++ can make more assumptions about code. It can do a better job inlining and optimizing. Except for explicitly final methods, Java can rarely do any inlining at all.
      Also, other features like C++ macros, templates, and stack variables can speed things up. And at runtime, C++ doesn't have to check type casts or array accesses. There is a price for all these little speed advantages—they all have dangers. Whether it's a binary compatibility issue or a stack/pointer crash, C++ is faster but less safe than Java.
      JavaDoc is a wonderful thing. Classes and members in Java code that are preceded by JavaDoc-style comments (enclosed in /** */) are processed by a tool included with the JDK that extracts and formats API documentation. This helps to keep the state of external documentation in sync with changes to the source code. It is great that this is part of the language, but similar tools are available for C and C++. Autoduck, a too that originated within Microsoft and currently available at ftp://ftp.accessone.com/pub/ericartzt/autoduck.zip, is just such a tool and produces results every bit as nice as JavaDoc.
      The lack of a preprocessor in Java is great most of the time. The only part that I really miss is the ability to ifdef code for various build types (very important in real-world coding environments). Microsoft has remedied this situation in their compiler.

Conclusion
While there are many books and references comparing Java to C++ with all the gory details, I think it's especially important to understand the big picture. I hope this article has helped in that respect, and I wish you the best of luck, whether you choose to program in Java or C++. Cheers!

From the September 1998 issue of Microsoft Interactive Developer.