31 January 1994


Conversion to C++ of the Andrew User Interface System

Wilfred J. Hansen, Director, Andrew Consortium, Carnegie Mellon
Rob Ryan, Andrew Consortium, Carnegie Mellon
Todd Inglett, IBM, Rochester, MN

Abstract: The Andrew User Interface System--formerly called the Andrew Toolkit--has now been converted to C++. This report describes the advantages of AUIS in C++, the conversion process, and what we have learned about C++ in the process. The conversion was possible because AUIS was written in "Class-C," a set of conventions for object-oriented coding in C. Our major problems were in resolving name conflicts, providing for initialization, and supporting dynamic loading.

Keywords: Andrew User Interface System, AUIS, Andrew Toolkit, ATK, C++, object-oriented toolkit, Ness scripting language, recursive embedding, integrated user interface system, compound documents, Object Linking and Embedding, OLE



1. Introduction
The Andrew User Interface System (AUIS) [Palay, 1988] has been one of the leading graphical user interface systems in the Unix (a trademark of Unix Systems Laboratory, Inc.) and X Windows world since its introduction at USENIX in 1988. However, acceptance has been limited somewhat because it was coded in "Class-C," a special-purpose set of conventions for object-oriented programming in C. Consequently, at the behest of its members, the Andrew Consortium spent 1993 converting AUIS to C++. The conversion effort skirted many traps, but succeeded and may even prove worth the effort.

Some readers may recognize AUIS by the name Andrew Toolkit or ATK. The Andrew Toolkit remains a component of AUIS and provides a compound-document architecture and tools for building applications and new objects. [Borenstein, 1990; Sherman, 1991; Palay, 1992] However, AUIS is considerably more; beyond ATK it includes complete editor applications for word processing, source editing, drawings, equations, spreadsheets, fonts, preferences, and more. The most elaborate application is the Andrew Message System, a full-featured mail and bulletin board reading, composing and management system; and one which is MIME compatible. AUIS is an open system with the source code distributed under the X tape license.

A hallmark of AUIS is architectures for recursive embedding of objects, which means that the object for one variety of information may be included within that of another. Figure 1, for example, shows several types of object embedded in a spreadsheet which is embedded in turn in a document.

Figure 1. Recursively embedded objects. This screen dump shows part of a document containing a spreadsheet whose cells have been combined in various ways and contain (clockwise from upper left) text, equations, animation, spreadsheet formulas, and a raster image.

Other toolkits are available for Unix and X, but none offer the scope of Andrew. Motif, OpenLook, the Athena widget set, and other widget sets provide "interactors" which each manage a small rectangle and provide callbacks to the application when the user operates the interactor. The newest and most publicized integrated user interface system is the Object Linking and Embedding (OLE) component of Microsoft's Windows version 3.1 [Microsoft, 1992]. This system supports recursive embedding and even provides that embedded objects may execute in a separate process. One user interface system that is already in Unix/X/C++ is the Fresco project[Linton, 1993], which is based on Interviews [Linton, 1989] and also on ideas from AUIS. Fresco is still far from complete. The plan is that applications will not be part of Fresco, but will be created by interested vendors.

2. Objects in the Andrew User Interface System
AUIS utilizes objects in several ways. Some objects in AUIS are "substrates" in that they can contain other nested objects. The principal examples are text, spreadsheets and drawings. Into these substrates it is possible to embed any of these objects as well as the non-substrate objects: rasters, images, equations, line animations, and a host of others. In fact, most AUIS "applications" are themselves objects and can be incorporated in a document or the screen image of another application.

Each visible AUIS object is implemented internally as two objects, one derived from the "dataobject" class and the other from the "view" class. Dataobjects retain the information to be displayed and are responsible for reading/writing the information from/to a datastream. Views are responsible for displaying the information from a dataobject within a rectangle on the screen. They also handle interaction with the user and printing. Splitting visible objects into two internal objects has the advantage that there can be multiple views on a single data object, as can happen when a document is viewed in two windows or when, for instance, a spreadsheet is viewed as both a table and a pie chart.

The heart of ATK is its architecture for recursive embedding of objects. In practice, the embedding is represented by a tree of view objects where the window is the root and each object is the parent of those it contains. The architecture defines methods on views that a parent calls on the child to pass events and other methods which a child calls on a parent to request future events. With these methods the parent and child negotiate the sharing of resources such as screen space, keyboard, mouse, menu, other user input devices, data stream space, printed page space, execution time and memory, extension language interfaces, and so on.

The HelloWorld application typically just displays HelloWorld and quits. The simplest verion in AUIS, as shown in Figure 2, does much more: displays on screen, prints, can be edited, and is avilable for cut/copy/paste. Only a few more lines would be required to change "Hello" to bold-italic.

/* helloworldapp.H */

#include <application.H>


class helloworldapp : public application {
public:
virtual ATKregistryEntry *ATKregistry(); helloworldapp(); \\ dummy constructor boolean Start(); \\override a virtual method
};

/* helloworldapp.C */

#include <andrewos.h>
#include <helloworldapp.H>
#include <im.H>
#include <text.H>
#include <textview.H>


ATKdefineRegistry(helloworldapp, application, NULL);
helloworldapp::helloworldapp(){}


boolean helloworldapp::Start(){
// declare and create objects class text *tx = new class text; class textview *txv = new class textview; class im *im = im::Create(NULL); char *hw = "Hello, world!"; // initialize objects tx->InsertCharacters(0, hw, strlen(hw)); txv->SetDataObject(tx); (im)->SetView(txv); return TRUE;
}

Figure 2. The helloworld application in the Andrew Toolkit. The operations labeled "initialize objects" establish a text inset containing "Hello, world!" Subsequently, other methods of the base class 'application' will display the image and manage interation.

Unlike HelloWorld, typical applications display a variety of objects scattered through a substrate widget such as a text, table, or drawing. Figure 3 shows the code to place a single button within an Andrew 'layout' object; each additional object requires similar code.

/* make the NextPage button */
NextPageComp = dobj->CreateComponent();
dobj->FillInComponent("cel", NextPageComp);
cel = (struct cel *)(NextPageComp->data);
cel->SetObjectByName("pushbutton");
cel->SetViewName(NULL, TRUE);
cel->SetRefName("Nextpage");
((struct pushbutton *)cel->GetObject())->SetText("Next Page");
dolay->SetComponentSize(NextPageComp, W-92, 0, 91, 40);

Figure 3. Code to insert a pushbutton into a layout. The text displayed in the button will be "Next Page". W is the layout width, so the button will be in the upper right corner and occupy 91x40 units.

One facility of AUIS is an application called createinset which generates the source code and help files for a new object with a given name. Although this new object is functional, its real role is to be modified to provide some new service. Such modification is usually easier than creation of a new object from scratch. As part of the conversion to C++, createinset was modified so it now produces code for C++ objects.

3. Advantages of C++
The primary advantage of C++ for AUIS is support of object-oriented programming, which is the major programming paradigm for AUIS. In addition, C++ offers a few facilities that seem likely to benefit AUIS. We are already using inline procedures and we may be able to utilize multiple inheritance in the near future.

Inline procedures are the appropriate conversion for many operations that were done with preprocessor macros in C. In general, creating a function from a macro may not be possible. However, many macros in AUIS were simple functions to access or assign to object components; the converter automatically transforms these to the appropriate in-line code. The macros

#define SetName(n) (self->name = (n))
#define GetName() (self->name)
were automatically converted to
inline void SetName(char *n) { (this->name = (n)); }
inline char * GetName() { return (this->name); }

Note that the types were automatically derived from the declaration of name in the object declaration.

Multiple inheritance would have aided in several cases in the original Andrew development. One obvious case is that of views and graphics. Views are derived from a storage management hierarchy while graphics are derived from a hierarchy intended to allow the system to utilize multiple window systems. It was not practical to have eithe base class derived from the other, and yet we wanted the convenience of doing graphics to view objects. With multiple inheritance, we could have defined a class that inherited from both bases; however, lacking it we devised a macro-kludge wherein each method of a graphic is defined as a macro function offered by views. One consequence is the need to modify both classes for any changes to the graphics method. It may be too late to revise the system to exploit multiple inheritance, but other opportunities to do so will arise.

4. Converting to C++
In Class-C, header files were preprocessed from a syntax that permitted declarations of methods and classprocedures, corresponding to C++'s virtual member functions and static member functions. The conversion to C++ was accomplished with two scripts written in Ness, the Andrew extension and string processing language. [Hansen, 1990; Hansen, 1992]

In order to determine which function calls in the C code needed to be converted to method calls in C++, the converter must determine which methods are defined in each class in the original system. This information is extracted from the original header files by one script, called C++Index. The main script--called C++Conv--is then invoked with the names of a collection of .c files. These and their corresponding header files are converted to C++ by local syntactic transformations.

Processing of .c files affected primarily function declarations and method calls. It was the convention in the old code that functions were declared with names of the form class__name, that is, the class name followed by two underlines and then the name of the specific function. Such declarations were usually converted by changing the double underline to a double colon: class::name. In addition, the declaration was changed from old C with parameters declared after the right parenthesis to ANSI C with declarations in-line within the header:

class::name (int length, ATK *altobj) .

Method calls were formerly disguised as function calls with the affected object as the first argument:

class_name(obj, 3, NULL)

These were converted to the C++ form with the affected object preceding the operator:
(obj)->name(3, NULL) .

Note that parentheses are always installed around the leading argument. They are usually unnecessary, but having them there lets C++Conv avoid checking to see if they are needed.

A crucial trick in the conversion simplified parsing: a first pass converted every comment and string to a fixed length value containing an index into a table. This meant that pattern match searches during processing would find no spurious matches within comments or strings. After processing, the fixed length values were re-expanded to their original values.

The converter did not attempt global parsing of the C code because this would only have been necessary to determine precise type information. Instead, the compiler was employed as an adjunct to the converter to find all the type errors. These were then corrected manually. In many cases the correction was a revision of the code that went well beyond what any converter could have done.

5. Problems posed by C++
The Class-C conventions provided an object oriented environment whose features are largely a subset of those of C++, so conversion was more straightforward than would be conversion of arbitrary C code. Nonetheless, we faced numerous problems. Some of these were exacerbated by our desire to continue utilizing dynamic loading as we have done in Class-C. Dynamic loading is essential during system development since it drastically reduces the time for the compile-link-test cycle. It is valuable in production use because of the large number of objects implemented within the system. If all objects were always linked with the entire system, the system would be an enormous file and would take considerably longer to load; moreover, installation of new objects would be more difficult. As it is, users can have their own libraries of objects without requiring a complete copy of the base system.

Where C++ lacked services, they have been implemented in a base class, ATK, from which all Andrew Toolkit objects must derive. (Objects derived from other bases or none at all can certainly be used, but they will not have these services.)

Class initialization. In monolithic systems, the main program can call on an initialization function in each class. In AUIS, the main program is not necessarily aware of all classes that will be utilized during an execution; it could be considerably wasteful to initialize several hundred unused classes. (And it may be impossible for the main program to initialize all dynamically loadable classes.) In Class-C, each class had a class procedure InitializeClass that was automatically called before any execution of code in the class. To emulate this mechanism in C++, the convention is that a class is initialized the first time one of its functions is called. Each constructor and static member function must include as its first operation the statement:

ATKinit ;

methods do not need this statement because a constructor must have been called in order for a method to be applicable. The initialization routine called by ATKinit may cause initialization of other classes, so all will be initialized before they are used. There is currently no detection of circularity in initialization; the code is marked as having been initialized before possibly initializing other functions. (Perhaps something more stringent would be useful.)

We considered introducing file scope objects whose constructors would implement the initialization for a class. However several current C++ implementations construct all file scope objects before main is called. Since ATK is a very large system we believed that lazy execution of class initialization code was required.

Creation by type name. When an AUIS data stream is read in, the type of each subordinate object is denoted with a character string giving the name of the object class. The ATK class provides a static member function ATK::NewObject whose argument is a character string and whose value is an object of the named class.

In order to implement ATK::NewObject, each class of objects must be registered. A class definition prepares for this by including in the .C file a call on the ATKdefineRegistry macro. This macro creates a table entry which is installed in a central table when the main program or loader calls ATKregister for the class.

Object initialization. The Class-C method of initializing objects was for each class to provide an initialization function with the following signature:

boolean classname_InitializeObject(struct classname *self)

Obviously the return of a boolean indicating success or failure would not map directly to the use of constructors in C++. (Since C++ constructors have no return type.) The expected way for constructors to fail is to throw an exception. However exceptions are not widely implemented yet. Indeed, initially none of the C++ compilers available to us had working exceptions. For the time being, InitializeObject methods are converted to constructors, and the return statements are converted to a macro which simply prints an error message if the value passed indicates failure.

Header file incompatibilities. Header files in various compilers and operating systems are incompatible with C++. For example, the IBM AIX 3.1.5 header files used the keyword "new" as a parameter name. The Cfront 2.1 and GNU C++ implementations we had available did not remedy this situation. Moreover, some functions which aren't specified by any standard simply had no prototype. One result was that we factored out network socket code into C source files. We also adopted a coding standard requiring inclusion of andrewos.h as the first included file. This header file includes a number of standard system header files, doing whatever is necessary to address any failings of the header files with respect to C++ compatibility. The Class-C to C++ converter imposed this standard on all converted source files.

Nested types. An attempt was made to use nested types. In particular several classes used function pointers for callbacks, and it seemed sensible to provide typedefs for these pointer types within the class declaration. This approach was abandoned when we discovered the GNU C++ compiler had several bugs in this area and the Cfront 2.1 compiler didn't implement nested types at all. We settled for manual name scoping of the typedefs by prepending classname_ to the typedefs.

Inheritance. In C++ all names in the scope of a class are inherited by derived classes. In Class-C, however, only ordinary methods were inherited; access to data members of base classes were via a "header" member as in self->header.dataobject.id. Conversion led to some silent name clashes between base and derived classes. Clashes with class procedure names were harmless, since our original code would never try to access the wrong version of the function. Data conflicts were hazardous, however, since initially the converter lost the information about which instance of the variable name was desired. Constructs like self->header.dataobject.id were converted to self->id, so a derived class version of id might be silently substituted for the base class version. To resolve the problem, the converter was fixed to retain the information; the example converts to this->dataobject::id.

Name scope. The introduction of nested types in some compilers broke some code where structures, enums, and unions were defined inside structures or classses, or where the first reference to a type was within a struct or class. These nested types were now placed in the scope of the struct or class, instead of in the global scope as before. To avoid this problem the converter was modified to warn of type definitions within structs and classes. Where these occurred we manually either provided a forward declaration or moved the definition of the type outside the class.

In Class-C each class has separate name spaces for member functions (methods, macros and class procedures) and data. This led to a situations where a function and a data member of a class had the same name. The converter was extended to warn about these name conflicts and they were resolved manually.

In C++ the names of data members are in the scope of member functions effectively between file scope variables and arguments. This means that an unqualified reference to a global variable could be silently overridden by a class member of the same name. The converter resolved this by utilizing :: to ensure that any potential conflicts would be resolved automatically or result in a compiler error. In Class-C all class data references were to structure members, so the converter added :: before any name of a class member which was not preceded by '.' or '->'. within the body of member functions. Unfortunately the case of local variables shadowing class members also triggerred the addition of ::, so a compiler detectable syntax error resulted in this case. Avoiding this problem would have required full type information to distinguish between local variable declarations and statements.

Conflicts also arose from the use of struct's and functions of the same name, since the compiler thought the function call was a call to the constructor. This problem was avoided by manually renaming the structure or function where possible, and by making sure that a prototype for the function was seen before the call.

Multiple inheritance. ATK is a "single root class" toolkit. In Class-C the root class didn't exist as such, but there was a struct basicobject, to which any object could be cast and still provide type information. This proved adequate since Class-C supported only single inheritance. When the ATK runtime system for C++ was designed an explicit root class seemed the best solution. Now that we have started to look at making it safe for client code to use multiple inheritance with ATK classes, we have discovered some problems. If the root class ATK is derived non-virtually, multiple instances will be included in each derived class. In order to cast a derived class pointer up to an ATK pointer in this case an explicit series of casts will be needed to pick one of the ATK instances. It would then be impossible to cast the pointer back to the original type without knowing the exact sequence of casts used to create the ATK pointer. Not only is this dangerous, but the current C++ definition makes it impossible if the derivation from ATK is virtual. (We hope that RTTI will allow down casts from a virtual base.)

Run-time systems. Class-C provides run time type information, virtual constructors, and dynamic object construction by class name. A separate section of this paper will address our design considerations for dynamic loading in C++.

Run time type information for ATK in C++ (of the same sort as the RTTI proposal before the C++ ANSI committee) is provided via a common base class (ATK) and a class registry.

The root class ATK provides static methods to display a message or throw an exception on failure of a constructor, create a new object given a string representing the class name, compare two classes for a base/derived class relationship by name, query by name whether a given class has been registered, load a class by name, or register a class. A single virtual function ATKregistry returns a pointer to the ATKregistryEntry for the object's class. This function is implemented by the ATKdefineRegistry macro described below. Other methods of the ATK class provide for accessing the class name of an object, creating a new object of the same class, and testing an object for a base/derived relationship with another object or class.

An ATKregistryEntry structure for each class contains the class name, a pointer to a function to create a new instance of the class, a pointer to the class initialization function, a list of the parent classes, and pointer to the next class in the registry. Currently the run time system is limited to single inheritance. This is because the function to create a new instance returns an ATK *, thus without compiler support casting it down to the appropriate type would be impossible if the class is derived from ATK multiply or virtually.

The ATKregistryEntry for a class is defined with the macro ATKdefineRegistry(classname, baseclassname, classinitfunction) in the top level of the source file implementing the class classname. The ATKregister(classname) macro is used to enter the class in the class registry. Generally a source file with a function containing the ATKregister calls is generated automatically by a program, given a list of the desired classes and/or libraries. The generated function is then called from the main() function of the program.

Future work will probably include phasing out the use of the C++ ATK run time type information system in favor of the ANSI standard support. One particular feature the C++ ATK system lacks is the safety of the proposed checked cast.

Dynamic loading. Class-C provides very flexible, on-demand dynamic loading. The header file for a class is sufficient to compile and run code utilizing it. During execution the class is dynamically loaded when the code first executes a "class procedure" (constructor or static member function) of the class. Another facility offered is to create an object of a class from a C string giving the class's name. With C++, dynamic loading is more difficult and less portable; the tricks used in Class-C involved preprocessor definition of function names, but static member functions are usually called with "::" qualifiers and there is no good way to replace them with preprocessor magic. In consequence, dynamic loading in the C++ version will be restricted to loading a class given its string name. Methods can be applied to objects of loaded classes only if they are virtual methods of a base class linked with the system.

Weak vs. strong typing. AUIS code in Class-C is primarily based on traditional, non-ANSI C without function protoypes. The C++ converter automatically added prototypes and the C++ source was modified by hand where compilation using these revealed type errors. Many such problems occured with function pointers because the original code assumed that function pointer values are interchangeable. For instance, the "proctable" stores pairs: name and function pointer. The cannonical prototype for these functions is (void foo(struct basicobject *self, long rock)). However, sometimes these functions return values and often the rock was a pointer. Casts were necessary in many places to force the code to compile. Almost always the actual function was defined to take a pointer to a derived class of view, for instance textview, figview, rasterview, and so on. However, passing a derived object in this fashion will break with multiple inheritance: it may not be possible for the recipient function to access the passed object as a view if it derives from other classes as well. We believe that the only truly type-safe solution requires templates.

Memory management. Both of the main classes derived from class ATK use reference counting. In consequence, such objects cannot be terminated with the normal C++ delete operator. Instead they must apply the inherited method Destroy. For the same reason, objects should not be declared automatic. (Pointers to objects can be automatic.)

6. Performance
There should be little difference in performance between the C++ and C versions of AUIS because much of the processing is straight C code without method calls; and even where there are method calls, the code generated should be similar in both cases. Differences did occur, however, because two different compilers were used, a native compiler for the C version and g++ for the C++ version. We coded three tests in Ness, with test cases selected to exercise different aspects of object oriented programming.

Test 1 - Count. This test counted down an integer variable from twenty-thousand to zero. It made no method calls and thus measured the quality of code created by the C or C++ compiler.

Test 2 - New. Three thousand objects were created. This measuresd primarily the object creation mechanism, but some methods were called during object initialization.

Test 3 - Dup(n). A string containing styled text was concatenated with itself n times in the form
s := s ~ s This tested many method calls and object creations as the styles were copied.

Results are reported for two platforms in Table 1, where all measurements are in seconds. Several runs were made and the lowest value is reported as being the least likely to have been affected by other processes. In most cases the other values were within three percent of the lowest.
C C++
PMAX/Ultrix
Count 1.08 1.26
New 1.04 0.80
Dup-6 1.26 1.41
Dup-8 4.57 30.32
RS6000/AIX3.2
Count 0.73 0.52
New 0.83 0.47
Dup-8 3.01 3.25
Table 1. Execution times for three tests (seconds).

The Count test shows that g++ produced faster code for the RS6000 and slower code on the PMAX. The New test, however, shows that creating objects is faster with C++. Possibly it is using a faster malloc package. In both cases, the Dup test showed the C code faster. On the PMAX, the parameter was reduced from eight to six since it seemed possible that page thrashing accounted for the discrepancy. Even the lower parameter, however, showed that the C code was faster.

7. Availability
As of February, 1994, there are TWO versions of the Andrew User Interface System. Version 6.2 is the old AUIS version in C as it was released to Andrew Consortium members in January, 1993 (although then numbered 5.2). It is freely available for exploitation both private and commercial. The newer version, Version 7.1 in C++, is being distributed to members of the Consortium for their use and will be released for general use at a later date. To try AUIS from any internet workstation, give the command
finger @atk.itc.cmu.edu

If you do acquire the Andrew User Interface System, you will find yourself with an excellent environment for word processing, editing program source text, and many other realms. You will also have the capability to extend this environment in new and imaginative ways. If you do the latter, we would be delighted to have you submit your work for incorporation into the AUIS distribution so it can be enjoyed by all.


References
Borenstein, Nathaniel S., Multimedia Applications Development with the Andrew Toolkit, Prentice Hall, 1990.

Hansen, Wilfred J., Enhancing documents with embedded programs: How Ness extends insets in the Andrew Toolkit, Proceedings of IEEE Computer Society 1990 International Conference on Computer Languages, March, 1990, New Orleans, IEEE Computer Society Press (Los Alamitos, CA) 23-32.

Hansen, W. J., Subsequence References: First Class Values for Substrings, ACM Trans. Prog. Lang. and Sys. 14, 4, Oct. 1992.

Linton, M. A., J. M. Vlissides, P. R. Calder, Composing user interfaces with interviews, IEEE Computer 22(2), February, 1989, 8-22.

Linton, M. and C. Price, Building Distributed User Interfaces with Fresco, Proceedings of the 7th X Technical Conference, Boston, Massachusetts, January, 1993, pp.77-87.

Microsoft Corp., Object Linking and Embedding (OLE), Part No. 098-31727, Microsoft Corp. (Redmond, WA, 1992).

Palay, Andrew J., Wilfred J. Hansen, et al., The Andrew Toolkit - An Overview, presented at the Usenix Conference, Dallas, TX, January, 1988.

Palay, Andrew J., Towards an "Operating System" for User Interface Components, in Multimedia Interface Design ed. Meera M. Blattner and Roger B. Dannenberg, ACM Press (New York, 1992).

Sherman, Mark, D. Anderson, W. J. Hansen, T. P. Neuendorffer, A. J. Palay, Z. Stern, "Allocation of User-Interface Resources in the Andrew Toolkit,", Proceedings of the International Conference on Multimedia Information Systems, (Singapore) McGraw-Hill, January, 1991.