(Message blind:183)
To: ram@cs
Reply-To: Christopher.Hoover@CS.CMU.EDU
Subject: GGC
Date: Sun, 27 May 90 04:36:45 EDT
Message-ID: <4017.643797405@LISP-PMAX1.SLISP.CS.CMU.EDU>
From: Christopher.Hoover@LISP-PMAX1.SLISP.CS.CMU.EDU


Rob,

Here's what I am currently thinking.  Comments would be greatly
appreciated.  (These are my notes so ignore spelling and grammar.)

There will, in essence, be three heaps -- one for boxed objects, one
for unboxed objects, and one for code objects.  (We can still keep
read only space if we want but I'm not sure it's all that necessary
and static space will be split into several of the older generations.)

Each generation is divided into two semispaces (or buckets) and
additionally some of the generations will have a creation space where
objects may be allocated.  When scavenging a generation, one of the
semispaces acts a newspace for objects being copied out of the
previous generation and the other acts as a fromspace for objects
being copied into the next generation.  By realizing that the data in
creation space is ordered by age and immediately advancing some
percentage of that data, the advancement policy can be set (either
fixed or adaptively) anywhere between a one scavenge threshold and a
two scavenge threshold.  This is nearly identical to the Shaw/Wilson
setup.

Here's a new twist for us -- the first generation will contain both
boxed and unboxed data.  It's never scanned so there is no problem
with mixing the objects.  GC can be optimized for scavenging this
first generation and will sort the objects out before transporting
them to appropriate second generation.  The overhead here will be
minimal since GC already has to figure out what type of object it is
transporting.

Using a common first generation is a win because we can do most of our
allocation without any special cases and we can still use our
``increment a register''-style allocation.  It may also win with
respect to paging (and caching) since most of the hot objects will be
in the same area (and there won't be any fragmentation like you get
when you mix boxed and unboxed objects in later generations).

Allocation of certain types of objects will, of course, still be
special cased.  This includes big arrays, code objects, and other
objects that generally don't die off quickly (hash tables, for
example, ?what else?).  The addition of a creation space in the first
couple generations will allow the programmer to control the
advancement of objects.  This will require some special allocation
code and will be somewhat slower than zero generation allocation since
the allocation code will have to decide in which heap to allocate the
object and it will have to load and store a memory location to do the
allocation.  That is not very costly and even if it were moderately
expensive it could be amortized over the life of the object.  Further,
you would always win big since the objects would not incuring repeated
copying costs before becoming tenured.

Code objects are special as we've discussed before since they are both
boxed and unboxed.  I think maybe a single generation with a creation
space may be sufficient here but maybe we need at least two
generations.  I'm sure you have a better idea of how often code
objects get created (and how big they are) so you probably will have a
better idea of what should happen here.

Perhaps the right thing to do is to create new code objects in the
common first generation (as discussed before) so that all code objects
have an initial chance to die off before being advanced.  This might a
big win since it seems to me that code objects probably have some sort
of bimodal lifetime distribution -- either they have a very short or a
long lifetime.

Here's an idea for code objects that may be silly that's based on a
variation on an entry table scheme.  Since code objects are immutable
(w/r/t the mutator) it should be possible to copy all of boxed
pointers within them onto continguous pages (along with pointers back
to the original pointers) and scavenge just those pages.  Any pointers
which are modified will be updated by following the back pointers.
Further, by sorting the pointers according to where they point, GC
would only have to scavenge a subset of them most of the time..  This
should reduce paging considerably during generational scavenges.
(What I'm particularly worried about is the cost of first generation
scavenges.)  The overhead of maintaining such a data strucutre would
not be insigifnificant (so maybe only do it for the oldest generation
and always scan all of the smaller younger generations).  The size may
be quite prohibitive though -- any ideas on how big?

Marking will initially be done with a Mach external pager but the unit
of marking will be configurable so that we may utilize software card
marking for certain types of objects if deemed useful.  We may need
some combination of recording, software car marking, and VM-assisted
page marking.  We will need some emperical evidence to decide
conclusively.  For now, stick with page marking since it is
(relatively) easy.


Random Points:

To increase the age threshold, it will be preferable to increase the
size of a generation over splitting it into two generations assuming
we have reasonable VM support and we don't pay too badly for having a
big heap -- but only to a point -- since while you don't incur extra
copying costs like you sometimes do when you split a generation,
scavenging a bigger generation can take longer.  There's some break
even point here -- if you split the generations and scavenge the
younger one with a schedule that kills off most of the data most of
the time then you win since you don't pay the extra copying costs.

Hash tables handled as per our last conversation.

Something interesting to think about is that the lifetime of some
heap-allocated data are closely associated with function calls and
returns.  May be able to do something clever here.  Something similar
but completely different: might want to be clever and perfer to GC
when control stack is near the top/bottom (depending on how you
picture stacks).  This would take care of the garbage on the stack
problem.  Current PMAX GC already completely zeros the stack.

-- Chris.
(Message inbox:201)
Replied: Sun, 27 May 90 21:26:11 EDT
Replied: "Rob.MacLachlan@FRED.SLISP.CS.CMU.EDU blind"
Return-Path: <Rob.MacLachlan@FRED.SLISP.CS.CMU.EDU>
Received: from fred.slisp.cs.cmu.edu by SPICE.CS.CMU.EDU id aa17899;
          27 May 90 20:43:48 EDT
Received: from fred.slisp.cs.cmu.edu by FRED.SLISP.CS.CMU.EDU id aa01050;
          27 May 90 20:42:51 EDT
To: Christopher.Hoover@CS.CMU.EDU
Subject: Re: GGC 
In-reply-to: Your message of Sun, 27 May 90 04:36:45 -0400.
             <4017.643797405@LISP-PMAX1.SLISP.CS.CMU.EDU> 
Date: Sun, 27 May 90 20:42:35 EDT
From: Rob.MacLachlan@FRED.SLISP.CS.CMU.EDU


    To: ram@CS.CMU.EDU
    Subject: GGC
    Date: Sun, 27 May 90 04:36:45 EDT
    From: Christopher.Hoover@LISP-PMAX1.SLISP.CS.CMU.EDU
    
    Here's a new twist for us -- the first generation will contain both
    boxed and unboxed data.

Sounds reasonable.  Probably the biggest win is in simplifying unboxed
allocation (floats, etc.)
    
    Allocation of certain types of objects will, of course, still be
    special cased.  This includes big arrays, code objects, and other
    objects that generally don't die off quickly (hash tables, for
    example, ?what else?).

I would be inclined to automatically tenure objects only when there is
little alternative, i.e. really big arrays.  Code can be specially
allocated if it makes life simpler, but I don't see any compelling
performance reason to do so.  It would be nice to have some sort of
allocation control macro ALLOCATE-IN-GENERATION (or something) that could
be used to explicitly allocate objects in later generations.  But this
would be difficult if generation 0 works differently.

    
    Code objects are special as we've discussed before since they are both
    boxed and unboxed.  I think maybe a single generation with a creation
    space may be sufficient here but maybe we need at least two
    generations.  I'm sure you have a better idea of how often code
    objects get created (and how big they are) so you probably will have a
    better idea of what should happen here.

I think we probably want at least two generations, with the oldest
containing initial system code, and would probably never be flipped.
For code objects, there might be some generations missing in the middle.
Any code object that makes it out of generation-0 will probably live
forever, so you could bump it way down.
    
    Perhaps the right thing to do is to create new code objects in the
    common first generation (as discussed before) so that all code objects
    have an initial chance to die off before being advanced.  This might a
    big win since it seems to me that code objects probably have some sort
    of bimodal lifetime distribution -- either they have a very short or a
    long lifetime.

That sounds good too.  Ignoring the possibility of bizzaro self-agumenting
programs, the only garbage code objects will be top-level forms, which have
very short lives.
    
    Here's an idea for code objects that may be silly that's based on a
    variation on an entry table scheme.  Since code objects are immutable
    (w/r/t the mutator) it should be possible to copy all of boxed
    pointers within them onto continguous pages (along with pointers back
    to the original pointers) and scavenge just those pages.  Any pointers
    which are modified will be updated by following the back pointers.
    Further, by sorting the pointers according to where they point, GC
    would only have to scavenge a subset of them most of the time..  This
    should reduce paging considerably during generational scavenges.

    (What I'm particularly worried about is the cost of first generation
    scavenges.)  The overhead of maintaining such a data strucutre would
    not be insigifnificant (so maybe only do it for the oldest generation
    and always scan all of the smaller younger generations).  The size may
    be quite prohibitive though -- any ideas on how big?
    
I think that through GC policy you could keep cross generation pointers in
code objects to a minimum.  e.g. when scavenging a code object, transport
all objects into the same generation.  I fear it is impossible to guarantee
that there are *never* cross generation pointers in code, since the object
might have already been transported (due to another reference) when we
scavenge the code object.

However, it would be easy for the scavenger to note all cross-generation
pointers in code objects in a per-code-generation list.  This could be in
the format you describe above, but should stay fairly small.

An advantage of explicitly tracking the (potential) cross-generation
pointers in code is that it eliminates any need for the scavenger to
understand how to scavenge code objects via card marking.  It seems that
this reduction in complexity offsets the complexity of maintaining a
separate data structure to represent cross-generation pointers in code.


    Marking will initially be done with a Mach external pager but the unit
    of marking will be configurable so that we may utilize software card
    marking for certain types of objects if deemed useful.  We may need
    some combination of recording, software car marking, and VM-assisted
    page marking.  We will need some emperical evidence to decide
    conclusively.  For now, stick with page marking since it is
    (relatively) easy.

Yeh.  Though I don't see much win for recording write addresses.  I guess
it's faster than software card marking, but if you do it often enough so
that speed matters, then you will lose from the size of the write log.

I suspect that more interesting than trying to use different strategies 
for different object types will be using different strategies for different
systems, i.e. Hemlock might use s.w. card marking for quick GC's (low
latency), whereas the compiler might use page marking.

For this to work, all random system code will have to be compiled for card
marking.  Code that uses page marking would have to the run in a special
dynamic environment (WITH-PAGE-MARKING ...), whatever.  You would flush all
the dirty pages at the beginning and end of this form, and for writes
within, would mark all the cards on the page.  

    
    Random Points:
    
    To increase the age threshold, it will be preferable to increase the
    size of a generation over splitting it into two generations assuming
    we have reasonable VM support and we don't pay too badly for having a
    big heap -- but only to a point -- since while you don't incur extra
    copying costs like you sometimes do when you split a generation,
    scavenging a bigger generation can take longer.  There's some break
    even point here -- if you split the generations and scavenge the
    younger one with a schedule that kills off most of the data most of
    the time then you win since you don't pay the extra copying costs.

My guess is that for throughput, you only need 4 generations:
 0: flip often, fractional second GC time.
 1: sized roughly by physical memory so that you don't thrash: 10's of
    seconds GC time.
 2: any user-allocated objects that make it past 1.  Flip only when in
    danger of running out of VM.  Coffee break time.
 3: Initial (static) objects.  Never flip.

For low GC latency, you might want more generations.
    
    Something interesting to think about is that the lifetime of some
    heap-allocated data are closely associated with function calls and
    returns.  May be able to do something clever here.  Something similar
    but completely different: might want to be clever and perfer to GC
    when control stack is near the top/bottom (depending on how you
    picture stacks).

Conceptually a good idea, although it might be hard to come up with a good
"close" heuristic.  In any case, it would be a win to have a
GC-PROCRASTINATE form that discouraged GC in its extent.

  Rob
(Message blind:185)
To: Rob.MacLachlan@FRED.SLISP.CS.CMU.EDU
Reply-To: Christopher.Hoover@CS.CMU.EDU
Subject: Re: GGC 
In-reply-to: Your message of Sun, 27 May 90 20:42:35 -0400.
Date: Sun, 27 May 90 21:26:12 EDT
Message-ID: <4330.643857972@LISP-PMAX1.SLISP.CS.CMU.EDU>
From: Christopher.Hoover@LISP-PMAX1.SLISP.CS.CMU.EDU


Rob,

Thanks for the input -- at least I now know that I'm not totally out
in left field.

>>     Allocation of certain types of objects will, of course, still be
>>     special cased.  This includes big arrays, code objects, and other
>>     objects that generally don't die off quickly (hash tables, for
>>     example, ?what else?).
>> 
>> I would be inclined to automatically tenure objects only when there is
>> little alternative, i.e. really big arrays.  Code can be specially
>> allocated if it makes life simpler, but I don't see any compelling
>> performance reason to do so.  It would be nice to have some sort of
>> allocation control macro ALLOCATE-IN-GENERATION (or something) that could
>> be used to explicitly allocate objects in later generations.  But this
>> would be difficult if generation 0 works differently.

I didn't understand what you were getting at at first but you're
right.  You can't simply switch the ALLOC register to the free pointer
for generation x (x > 0) for the extent of the macro if generation 0
works differently.  This is unfortunate.

Perhaps the only way to deal with this is through a combination of
special-cased allocation code and compiler magic.  Any Lisp-based
allocation code would have to check a special to decide which set of
primitives to use -- primitives for generation 0 allocation or
primitives for generation x (x > 0) allocation.  Further, the compiler
would have to use a different set of VOPS for all inline consing
within the extent of such a form.  Unfortunately there's still a
problem -- the effect of the form w/r/t inline consing would be
lexically scoped rather than dynamically scoped.  (Is ``scope'' the
right word to use here?)  Perhaps this is good enough.

>> My guess is that for throughput, you only need 4 generations:
>>  0: flip often, fractional second GC time.
>>  1: sized roughly by physical memory so that you don't thrash: 10's of
>>     seconds GC time.
>>  2: any user-allocated objects that make it past 1.  Flip only when in
>>     danger of running out of VM.  Coffee break time.
>>  3: Initial (static) objects.  Never flip.

Yeah, I forgot to say that I thought 4 generations was the right thing
to do.  I am glad that is your impression too.

-- Chris.