Dylan object references are in general represented by a pair of words, the
descriptor and the data:
 -- The "Descriptor" always points directly at the header of a tagged Dylan
    object.  The class of the object is uniformly accessed by indirecting the
    header.
 -- The "Data" is a word of "immediate" data which is not interpreted by the
    garbage collector.

There are three general kinds of object references:
  Heap object pointer:
    Represents any pointer to an object in the Dylan heap.  The "Data" word is
    unused, and its contents are undefined.

  Immediate object:
    Represents immutable objects that fit into the "Data" word, like integer
    and single-float.  The "Descriptor" is a "Type Proxy" which usually only
    carries type information.  (Probably <integer> is one word with error on
    overflow.)

  External object pointer:
    Represents pointers to objects outside of the Dylan heap, used for
    interoperation with other languages and interface to persistent storage.
    The "Descriptor" is a "Type Proxy" which carries our understanding of the
    foreign object's class and type.

What are the advantages of this representation?
 -- Direct, cons-free access to external objects and operations.  Full word
    integers and pointers can be passed with no range or alignment
    constraints, and no risk of unpredictable (or any) consing.  Also, typed
    external object references (what we now call Alien values) are never
    consed either, allowing non-scalar external objects to passed around
    freely with no risk of consing.
 -- There are no tags in the descriptor, so getting the class of an object
    (for generic function dispatch) is always a single memory reference.
 -- An unbounded number of immediate types.  Since there are no scarce tag
    bits, we could do thinks like have an immediate <point> object with two
    16bit X&Y values.
 -- Non-consing single-floats for numeric code.
 -- Efficient integer code without any need for a sub-word "fixnum" type.


The potential disadvantages of this representation are pretty obvious: objects
might be twice as big and code might take twice as long to run.  If true, that
would be unacceptable; why isn't it true?

First consider space:
 -- Dylan objects won't be entirely composed of object references; strings,
    code and limited collections will form the greater part of the memory
    footprint of most apploactions.  Also, some header and fragmentation
    overheads won't be increased.  As I recall, an analysis of the CMU CL
    image found its size would increase by only 25% if the size of an object
    reference increased from 32 to 64 bits, which is 1/4 of the worst-case
    increase.
 -- The vast majority of object slots will be statically known to be Dylan
    heap pointers, in which case no space is needed for the data word.  How
    so?  Well, if there's almost any type declaration on the slot, then it
    must be a heap pointer.  The type can be any union of non-immediate and
    non-external classes (and singleton or limited types based on those
    classes.)  Since the immediate classes will be sealed, it is easy to tell
    at compile time whether a class might be immediate.  If we only allow
    external classes to inherit other external classes (or <object>), then any
    Dylan class is known to be non-external.
 -- If the value is statically known to be an immediate type, or to be a
    direct instance of a particular external class, then there is no need to
    store the descriptor word.

The only bad space case is with <pair> and <simple-object-vector>, since these
types are defined to hold any <object>.  These objects would effectively
double in size.  However, you can easily get compact vectors of known heap
pointers by using a limited vector type instead of <simple-object-vector>.


What about run-time?
 -- The run-time concern is really already mostly addressed by the space
    discussion.  The main reason for increase in run-time is from moving twice
    as many words.  But we really aren't storing those words very often.
 -- Even in cases where we are forced to allocate space for the data word, we
    often don't need to read or write it.  Often when we are passing an
    argument or storing into a <pair>, we know that the object we are storing
    is a heap pointer, so there is no need to store anything in the data word.
 -- Similarly, on reading if we are expecting a heap pointer (due to a
    type assertion), we can just drop the "data" word on the floor.  Or in
    unsafe code, if an immediate is expected, we can just ignore the
    descriptor.
 -- Even in safe code, we can discard the type proxy for immediate values once
    we've done type checking.  To efficiently "tag" immediates, we probably
    want to make the address of the type proxies well-known constants so that
    we can load them into a register without doing a memory reference.

So, even if we ignore the efficiency advantages of this representation, it
seems that the peculiar efficiency penalties of this representation are
smaller than the relatively small space penalty.

This conclusion of "minimal cost" does depend on the assumption that people
will normally specify some sort of declaration on slots are array elements.
If there are no declarations, you probably have a fairly solid 2x performance
degradation, but that might still be better than consing unpredictably when
external objects are used (which all the one-word schemes tend to do.)


How important is predictable non-consing?
 -- If we are going to realize our goal of supporting a Non-GC programming
    model, then a guaranteed-non-consing language is a must.  This language
    might be a subset of Dylan (no upward funargs?), but to be useful it must
    be able to manipulate external objects.
 -- Even if GC is assumed, supporting external objects the way that CMU CL
    currently does results quirky inefficiency problems which are difficult or
    impossible to fix.