Dylan object references are in general represented by a pair of words, the descriptor and the data: -- The "Descriptor" always points directly at the header of a tagged Dylan object. The class of the object is uniformly accessed by indirecting the header. -- The "Data" is a word of "immediate" data which is not interpreted by the garbage collector. There are three general kinds of object references: Heap object pointer: Represents any pointer to an object in the Dylan heap. The "Data" word is unused, and its contents are undefined. Immediate object: Represents immutable objects that fit into the "Data" word, like integer and single-float. The "Descriptor" is a "Type Proxy" which usually only carries type information. (Probably is one word with error on overflow.) External object pointer: Represents pointers to objects outside of the Dylan heap, used for interoperation with other languages and interface to persistent storage. The "Descriptor" is a "Type Proxy" which carries our understanding of the foreign object's class and type. What are the advantages of this representation? -- Direct, cons-free access to external objects and operations. Full word integers and pointers can be passed with no range or alignment constraints, and no risk of unpredictable (or any) consing. Also, typed external object references (what we now call Alien values) are never consed either, allowing non-scalar external objects to passed around freely with no risk of consing. -- There are no tags in the descriptor, so getting the class of an object (for generic function dispatch) is always a single memory reference. -- An unbounded number of immediate types. Since there are no scarce tag bits, we could do thinks like have an immediate object with two 16bit X&Y values. -- Non-consing single-floats for numeric code. -- Efficient integer code without any need for a sub-word "fixnum" type. The potential disadvantages of this representation are pretty obvious: objects might be twice as big and code might take twice as long to run. If true, that would be unacceptable; why isn't it true? First consider space: -- Dylan objects won't be entirely composed of object references; strings, code and limited collections will form the greater part of the memory footprint of most apploactions. Also, some header and fragmentation overheads won't be increased. As I recall, an analysis of the CMU CL image found its size would increase by only 25% if the size of an object reference increased from 32 to 64 bits, which is 1/4 of the worst-case increase. -- The vast majority of object slots will be statically known to be Dylan heap pointers, in which case no space is needed for the data word. How so? Well, if there's almost any type declaration on the slot, then it must be a heap pointer. The type can be any union of non-immediate and non-external classes (and singleton or limited types based on those classes.) Since the immediate classes will be sealed, it is easy to tell at compile time whether a class might be immediate. If we only allow external classes to inherit other external classes (or ), then any Dylan class is known to be non-external. -- If the value is statically known to be an immediate type, or to be a direct instance of a particular external class, then there is no need to store the descriptor word. The only bad space case is with and , since these types are defined to hold any . These objects would effectively double in size. However, you can easily get compact vectors of known heap pointers by using a limited vector type instead of . What about run-time? -- The run-time concern is really already mostly addressed by the space discussion. The main reason for increase in run-time is from moving twice as many words. But we really aren't storing those words very often. -- Even in cases where we are forced to allocate space for the data word, we often don't need to read or write it. Often when we are passing an argument or storing into a , we know that the object we are storing is a heap pointer, so there is no need to store anything in the data word. -- Similarly, on reading if we are expecting a heap pointer (due to a type assertion), we can just drop the "data" word on the floor. Or in unsafe code, if an immediate is expected, we can just ignore the descriptor. -- Even in safe code, we can discard the type proxy for immediate values once we've done type checking. To efficiently "tag" immediates, we probably want to make the address of the type proxies well-known constants so that we can load them into a register without doing a memory reference. So, even if we ignore the efficiency advantages of this representation, it seems that the peculiar efficiency penalties of this representation are smaller than the relatively small space penalty. This conclusion of "minimal cost" does depend on the assumption that people will normally specify some sort of declaration on slots are array elements. If there are no declarations, you probably have a fairly solid 2x performance degradation, but that might still be better than consing unpredictably when external objects are used (which all the one-word schemes tend to do.) How important is predictable non-consing? -- If we are going to realize our goal of supporting a Non-GC programming model, then a guaranteed-non-consing language is a must. This language might be a subset of Dylan (no upward funargs?), but to be useful it must be able to manipulate external objects. -- Even if GC is assumed, supporting external objects the way that CMU CL currently does results quirky inefficiency problems which are difficult or impossible to fix.