Overview:
    Moon is correct that efficiency and ease of implementation are the only
    possible reasons for having integers that overflow.  However, there is
    much more to this than "I can do that in three instructions".  The key
    problem that we have found with Common Lisp integers is not that a SSC
    can't generate fairly good code when it has lots of encouragement from the
    programmer --- it is that doing so places a large burden on the programmer.

    This cost to the user is very considerable, yet we believe that this
    cost is incurred for very little gain.  We do not argue that extended
    and ratio arithmetic is useless; we offer a cost/benefit analysis of
    automatically flying into non-hardware-supported arithmetic, and we
    believe it shows that the costs far outweigh the benefits.


Consider every other language (except Lisp):

C, FORTRAN, Self, Pascal, Ada, ...  What do these languages have in common?
Well, one major thing is an integer type that overflows.  This type just so
happens to be directly supported by the hardware, and programmers rarely stop
to think how great it is to know that + always takes one instruction.  C
programmers are not unaware of extended arithmetic libraries, and they are
frequently used in certain specialized applications.  However, I don't hear
any great groundswell of opinion that extended arithmetic should be the
default.

Mainstream implementors and programmers are probably unaware of how small the
cost can be for all-the-time-extended-arithmetic, but the combination of lack
of demand for all-the-time and the awareness (and use of) when-you-need-it is
evidence for low marginal usefulness of the all-the-time approach.


Consider Common Lisp:

Many of you are Common Lisp programmers.  How often have you deliberately used
indefinite precision or ratio arithmetic?  (word integers don't count,
discussed later.)  I am very familiar with the code of the CMU Common Lisp
implementation (40k+ LOC).  There are two places where extended arithmetic is
deliberately used:
 -- Reading and printing floating point numbers.
 -- Doing integer range analysis in the compiler to prove that extended
    integers aren't being used.

Of course, I am sure that there are:
 -- Several places where bignums are accidentally (and unnecessarily) being 
    created, causing a slight inefficiency, and
 -- Many many places where overflow might possibly occur, resulting in a
    significant speed penalty and an even larger code size penalty.

This concludes the "low benefit" portion of our argument.


Costs of all-the-time extended integers:

The major costs of all-the-time extended integers are:
 1] Implementation complexity.  Increases the size and cost of Dylan
    environments.
 2] Run-time code size increase due to:
     -- error checking code,
     -- undesired generic arithmetic, and
     -- the presence of the extended integer implementation.
 3] Run-time slowdown, due to:
     -- error checking code, and
     -- undesired generic arithmetic.

And, the most important cost:
 4] Lowered programmer productivity due to the effort wasted in attaining
    efficiency requirements.

As mentioned in the overview, this is not a matter of all-the-time extended
integers being "efficient" or "inefficient" in any absolute sense.  What is
really at stake is the efficiency/man-hour ratio.

You will notice that getting efficient arithmetic with all-the-time extended
integers is mostly a matter of writing more code (adding declarations.)
Although this is a lower bound on the cost of the all-the-time approach, it
far underestimates the true cost, which lies primarily in determining which
declarations are necessary in any given implementation.


Type-specific arithmetic and portability:

One approach to avoid the where-to-declare dilemma is to declare everywhere.
To make this manageable, type-specific operators are defined by the
programmer.

CLX (the CL Xlib) is the main Common Lisp application I am familiar with which
is:
 -- portable,
 -- moderately well tuned, and
 -- extensively uses small integer indices.

CLX isn't written using generic arithmetic at all.  It implements
type-specific operators via macros which insert declarations around the CL
generic operations.  For example, in (index+ a b c), A, B, C and the result
are all asserted to be non-negative small integers.

We don't think type-specific operators are a good idea (and I don't think Moon
does either), but I'll note in passing two problems with this style:
 1] Programmers are encouraged to insert incorrect declarations.  In CLX,
    there were several places where INDEX arithmetic was used when arguments
    or the result could in fact be negative.
 2] The resulting code is not generic, not polymorphic, not OO.  With a SSC, 
    users can write generic number functions that will be efficient when
    inlined.  Type-specific operators prevent this style.

The point of this section is not primarily that type-specific operators are
bad, but that numerous programmers have found them to be a better solution
than Common Lisp generic integer operations.


Can the bignum code ever be left out of the runtime?

With quiet overflow, schemes for conditionally loading extended arithmetic are
less clean, since the semantics of <integer> would be changed by loading
extended arithmetic.  There are only two approaches I can see:
 1] If no library uses the extended-integer library, then all libraries are
    compiled to signal an error on overflow, instead of making an extended
    integer.  
 2] The extended-integer code can only be omitted when the compiler has
    *proved* no integer arithmetic *anywhere* in the program can overflow.  

Option [1] is basically a semantically inferior form of a short-integer
that is guaranteed to always signal an error on overflow.  In [2], the
extended-integer code is not a normal explicitly configured library, but
rather some sort of automagic selective linking.  Maybe selective linking
can work in this case, but in general, this sort of "tree shaking" has
never worked very well at controlling executable site, and I had thought
that Dylan planned to control size by having a small core and explicitly
configuring any needed extras.

Assuming the prove-needed-nowhere model of omitting arithmetic code, a large
burden is placed on the programmer to negotiate with the compiler convincing
it that every last spurious potential overflow is in fact impossible.  Getting
good runtime speed is relatively much easier, since only inner loops need to
be tweaked up.


"Proving" overflow is impossible:

Note that adding limited integer (or the equivalent Moon small-integer)
declarations can avoid generic arithmetic, but doesn't actually eliminate any
potential run-time errors, it just laboriously transmutes overflow errors into
the type errors.  For example, if I had:
    method (a :: <small-integer>, b :: <small-integer>, c :: <small-integer>)
      let x :: <small-integer> = a + b + c
    ...

I would be in trouble, because "a + b" is not known to be a <small-integer>
(guard bits discussed later.)  So, I can do something like:
    method (a :: <small-integer>, b :: <small-integer>, c :: <small-integer>)
      let x :: <small-integer> = check-type(a + b, <small-integer>) + c
    ...

The point is, when the compiler tells the programmer "prove to me this can't
overflow", the programmer is just going to stick in a declaration saying "it
can't overflow", thus transforming an overflow into a type error.  In other
words, forcing the programmer to add <small-integer> doesn't result in any
stronger guarantee of the program's correctness.

In theory, one might suppose that a programmer could somehow analyze all the
integer ranges in the program and construct a true proof that overflow
couldn't happen.  And with a SSC and a sufficiently powerful declaration
language (quantifiers, induction and stuff), the programmer could convince the
compiler that overflow wouldn't happen.  In practice, this doesn't really
work.

First, this approach has a poor cost/benefit comparison relative to just
adding declarations and hoping overflow won't happen.  Doing this sort of
analysis is time-consuming, and is easily invalidated as the program changes.
The unique benefit of the proof approach (being sure overflow won't happen) is
of low value, since other run-time failures in the program are much more
likely.

More relevantly for a dynamic language, doing this static analysis requires
that the program be made more static.  Unbounded data structures (like
indefinite precision integers) make any such proof impossible.  The only way
to get tight bounds on indices is to set tight bounds on data structure sizes.


So, what will Dylan programs look like if they don't use type-specific
operators or limited types?

Base declaration model: 
    All integer arguments, loop indices, object slots and function return
    values are declared to be small integers.  If an integer variable is bound
    to the result of a function which isn't statically declared to return a
    small integer, then that variable must be declared.  Local variables bound
    to arithmetic expressions do not need to be declared.


How do all-the-time and when-you-need-it compare in the number of additional
declarations needed in addition to the base model:
    When-You-Need-It:
    	No additional declarations.

    All-The-Time:
        Additional declarations will be needed inside arithmetic expressions
        and on trivial local let variables, but the need is implementation
        dependent and context dependent.  If you have to use a simple
        compiler, define some type specific operators, since all intermediate
        results need to be declared.  Otherwise, get to know your SSC, hope it
        has good diagnostics, and learn to read assembly code.  And pray you
        never have to use a different compiler...

    
Word Integers and Guard Bits:

Full-word integers are useful for interfacing to foreign code and for
manipulating raw data structures (often but not invariably related.)  To
get efficiency anywhere near that of conventional languages, a Dylan
compiler must be able to represent raw, full-word, untagged integers in
registers and on the stack.  There are problems with supporting fixed
signed-or-unsigned integers using an indefinite signed semantics (i.e.
there are reasons for additional classes), but it isn't too bad, and we can
assume that a SSC will have word integers, regardless of whether it
implements when-you-need-it or all-the-time.

Given that we have word integers which are a few bits larger than small
integers (but which can still be efficiently operated on), we can use these
extra bits as guard bits, so that we can add perhaps four small-integers into
a word-integer result, without requiring any intermediate type assertions (and
error checking.)  This means that for a limited (but important) class of
arithmetic expressions, no internal declarations are needed as long as the
result is destined for a variable declared to be a small-integer.

So, with guard bits, some expressions need no internal result type
declarations.  If all expressions are of the form "a + b + c + d", then to get
efficient code, the only increase in declarations above the base model is
declarations of local variables bound to arithmetic expressions.

Fixnum guard bits are a good hack; really the best approach there is for
eliminating generic arithmetic with all-the-time extended integers.
However, more declarations are needed, and the rules for adding them are
somewhat complex.  Also, expressions involving *,/,ash aren't really
helped; output type assertions are almost always needed.

Even so, using guard bits is slower than the CMU <short-integer> arithmetic.
Consider "a + b + c", where A, B, C and the result are all known to be small
integers.  With all-the-time and guard bits, the code is:
    ashr	t1, a, 3
    ashr	t2, b, 3
    add		t3, t1, t2
    ashr	t4, c, 3
    add		t5, t3, t4
    ashr	t6, t5, 29
    beq		t6, ok
    lognot	t6, t6
    bne		t6, result-not-small-error
ok:
    ashl	res, t5, 3

This is much better than doing generic arithmetic, but with the CMU
<short-integer>, the code is:
    add		t1, a, b
    add		res, t1, c

This code assumes an overflow-trapping add, which is present on MIPS, SPARC
and HPPA, at a minimum.  It is exactly the same code you would get in a
conventional language.


Why Dylan is Harder than Common Lisp:

Most of the above discussion is informed by our Common Lisp experience, which
is totally applicable, since the Moon proposal is the same as Common Lisp.
However, Dylan may make optimization of all-the-time extended integers harder,
since fortuitous output type assertions due to calling non-generic functions
are rarer.  

Arg specializers and result types on methods aren't nearly as good as
restrictions on the function itself; for example, in vec[i], it isn't clear
that "i" must be a small integer even if "vec" is a vector, since someone
could always add a method for vectors & big integers.

The CMU <short-integer> proposal doesn't suffer this disadvantage, since it
uses the object system to determine the result type, thus doesn't need output
type assertions.

In conclusion, why work so hard to get worse code?  What makes us think we can
force programmers to psych out an extremely complex compiler that knows better
than they do, when they can just say what they want by using another language?

  Rob