[2-9] What is CDR-coding?

CDR-coding is a space-saving way to store lists in memory.  It is normally
only used in Lisp implementations that run on processors that are
specialized for Lisp, as it is difficult to implement efficiently
in software.  In normal list structure, each element of the
list is represented as a CONS cell, which is basically two pointers (the
CAR and CDR); the CAR points to the element of the list, while the CDR
points to the next CONS cell in the list or NIL.  CDR-coding takes
advantage of the fact that most CDR cells point to another CONS, and
further that the entire list is often allocated at once (e.g. by a call to
LIST).  Instead of using two pointers to implement each CONS cell, the CAR
cell contains a pointer and a two-bit "CDR code".  The CDR code may contain
one of three values: CDR-NORMAL, CDR-NEXT, and CDR-NIL.  If the code is
CDR-NORMAL, this cell is the first half of an ordinary CONS cell pair, and
the next cell in memory contains the CDR pointer as described above.  If
the CDR code is CDR-NEXT, the next cell in memory contains the next CAR
cell; in other words, the CDR pointer is implicitly thisaddress+1, where
thisaddress is the memory address of the CAR cell.  If the CDR code is
CDR-NIL, then this cell is the last element of the list; the CDR pointer is
implicitly a reference to the object NIL.  When a list is constructed
incrementally using CONS, a chain of ordinary pairs is created; however,
when a list is constructed in one step using LIST or MAKE-LIST, a block of
memory can be allocated for all the CAR cells, and their CDR codes all set
to CDR-NEXT (except the last, which is CDR-NIL), and the list will only
take half as much storage (because all the CDR pointers are implicit).

If this were all there were to it, it would not be difficult to implement
in software on ordinary processors; it would add a small amount of overhead
to the CDR function, but the reduction in paging might make up for it.  The
problem arises when a program uses RPLACD on a CONS cell that has a CDR
code of CDR-NEXT or CDR-NIL.  Normally RPLACD simply stores into the CDR
cell of a CONS, but in this case there is no CDR cell -- its contents are
implicitly specified by the CDR code, and the word that would normally
contain the CDR pointer contains the next CONS cell (in the CDR-NEXT case)
to which other data structures may have pointers, or the first word of some
other object (in the CDR-NIL case).  When CDR-coding is used, the
implementation must also provide automatic "forwarding pointers"; an
ordinary CONS cell is allocated, the CAR of the original cell is copied
into its CAR, the value being RPLACD'ed is stored into its CDR, and the old
CAR cell is replaced with a forwarding pointer to the new CONS cell.
Whenever CAR or CDR is performed on a CONS, it must check whether the
location contains a forwarding pointer.  This overhead on both CAR and CDR,
coupled with the overhead on CDR to check for CDR codes, is generally
enough that using CDR codes on conventional hardware is infeasible.

There is some evidence that CDR-coding doesn't really save very much
memory, because most lists aren't constructed at once, or RPLACD is done on
them enough that they don't stay contiguous.  At best this technique can
save 50% of the space occupied by CONS cells. However, the savings probably
depends to some extent upon the amount of support the implementation
provides for creating CDR-coded lists.  For instance, many system functions
on Symbolics Lisp Machines that operate on lists have a :LOCALIZE option;
when :LOCALIZE T is specified, the list is first modified and then copied
to a new, CDR-coded block, with all the old cells replaced with forwarding
pointers.  The next time the garbage collector runs, all the forwarding
pointers will be spliced out.  Thus, at a cost of a temporary increase in
memory usage, overall memory usage is generally reduced because more lists
may be CDR-coded. There may also be some benefit in improved paging
performance due to increased locality as well (putting a list into
CDR-coded form makes all the "cells" contiguous). Nevertheless, modern
Lisps tend to use lists much less frequently, with a much heavier
reliance upon code, strings, and vectors (structures).