(Message blind:183) To: ram@cs Reply-To: Christopher.Hoover@CS.CMU.EDU Subject: GGC Date: Sun, 27 May 90 04:36:45 EDT Message-ID: <4017.643797405@LISP-PMAX1.SLISP.CS.CMU.EDU> From: Christopher.Hoover@LISP-PMAX1.SLISP.CS.CMU.EDU Rob, Here's what I am currently thinking. Comments would be greatly appreciated. (These are my notes so ignore spelling and grammar.) There will, in essence, be three heaps -- one for boxed objects, one for unboxed objects, and one for code objects. (We can still keep read only space if we want but I'm not sure it's all that necessary and static space will be split into several of the older generations.) Each generation is divided into two semispaces (or buckets) and additionally some of the generations will have a creation space where objects may be allocated. When scavenging a generation, one of the semispaces acts a newspace for objects being copied out of the previous generation and the other acts as a fromspace for objects being copied into the next generation. By realizing that the data in creation space is ordered by age and immediately advancing some percentage of that data, the advancement policy can be set (either fixed or adaptively) anywhere between a one scavenge threshold and a two scavenge threshold. This is nearly identical to the Shaw/Wilson setup. Here's a new twist for us -- the first generation will contain both boxed and unboxed data. It's never scanned so there is no problem with mixing the objects. GC can be optimized for scavenging this first generation and will sort the objects out before transporting them to appropriate second generation. The overhead here will be minimal since GC already has to figure out what type of object it is transporting. Using a common first generation is a win because we can do most of our allocation without any special cases and we can still use our ``increment a register''-style allocation. It may also win with respect to paging (and caching) since most of the hot objects will be in the same area (and there won't be any fragmentation like you get when you mix boxed and unboxed objects in later generations). Allocation of certain types of objects will, of course, still be special cased. This includes big arrays, code objects, and other objects that generally don't die off quickly (hash tables, for example, ?what else?). The addition of a creation space in the first couple generations will allow the programmer to control the advancement of objects. This will require some special allocation code and will be somewhat slower than zero generation allocation since the allocation code will have to decide in which heap to allocate the object and it will have to load and store a memory location to do the allocation. That is not very costly and even if it were moderately expensive it could be amortized over the life of the object. Further, you would always win big since the objects would not incuring repeated copying costs before becoming tenured. Code objects are special as we've discussed before since they are both boxed and unboxed. I think maybe a single generation with a creation space may be sufficient here but maybe we need at least two generations. I'm sure you have a better idea of how often code objects get created (and how big they are) so you probably will have a better idea of what should happen here. Perhaps the right thing to do is to create new code objects in the common first generation (as discussed before) so that all code objects have an initial chance to die off before being advanced. This might a big win since it seems to me that code objects probably have some sort of bimodal lifetime distribution -- either they have a very short or a long lifetime. Here's an idea for code objects that may be silly that's based on a variation on an entry table scheme. Since code objects are immutable (w/r/t the mutator) it should be possible to copy all of boxed pointers within them onto continguous pages (along with pointers back to the original pointers) and scavenge just those pages. Any pointers which are modified will be updated by following the back pointers. Further, by sorting the pointers according to where they point, GC would only have to scavenge a subset of them most of the time.. This should reduce paging considerably during generational scavenges. (What I'm particularly worried about is the cost of first generation scavenges.) The overhead of maintaining such a data strucutre would not be insigifnificant (so maybe only do it for the oldest generation and always scan all of the smaller younger generations). The size may be quite prohibitive though -- any ideas on how big? Marking will initially be done with a Mach external pager but the unit of marking will be configurable so that we may utilize software card marking for certain types of objects if deemed useful. We may need some combination of recording, software car marking, and VM-assisted page marking. We will need some emperical evidence to decide conclusively. For now, stick with page marking since it is (relatively) easy. Random Points: To increase the age threshold, it will be preferable to increase the size of a generation over splitting it into two generations assuming we have reasonable VM support and we don't pay too badly for having a big heap -- but only to a point -- since while you don't incur extra copying costs like you sometimes do when you split a generation, scavenging a bigger generation can take longer. There's some break even point here -- if you split the generations and scavenge the younger one with a schedule that kills off most of the data most of the time then you win since you don't pay the extra copying costs. Hash tables handled as per our last conversation. Something interesting to think about is that the lifetime of some heap-allocated data are closely associated with function calls and returns. May be able to do something clever here. Something similar but completely different: might want to be clever and perfer to GC when control stack is near the top/bottom (depending on how you picture stacks). This would take care of the garbage on the stack problem. Current PMAX GC already completely zeros the stack. -- Chris. (Message inbox:201) Replied: Sun, 27 May 90 21:26:11 EDT Replied: "Rob.MacLachlan@FRED.SLISP.CS.CMU.EDU blind" Return-Path: Received: from fred.slisp.cs.cmu.edu by SPICE.CS.CMU.EDU id aa17899; 27 May 90 20:43:48 EDT Received: from fred.slisp.cs.cmu.edu by FRED.SLISP.CS.CMU.EDU id aa01050; 27 May 90 20:42:51 EDT To: Christopher.Hoover@CS.CMU.EDU Subject: Re: GGC In-reply-to: Your message of Sun, 27 May 90 04:36:45 -0400. <4017.643797405@LISP-PMAX1.SLISP.CS.CMU.EDU> Date: Sun, 27 May 90 20:42:35 EDT From: Rob.MacLachlan@FRED.SLISP.CS.CMU.EDU To: ram@CS.CMU.EDU Subject: GGC Date: Sun, 27 May 90 04:36:45 EDT From: Christopher.Hoover@LISP-PMAX1.SLISP.CS.CMU.EDU Here's a new twist for us -- the first generation will contain both boxed and unboxed data. Sounds reasonable. Probably the biggest win is in simplifying unboxed allocation (floats, etc.) Allocation of certain types of objects will, of course, still be special cased. This includes big arrays, code objects, and other objects that generally don't die off quickly (hash tables, for example, ?what else?). I would be inclined to automatically tenure objects only when there is little alternative, i.e. really big arrays. Code can be specially allocated if it makes life simpler, but I don't see any compelling performance reason to do so. It would be nice to have some sort of allocation control macro ALLOCATE-IN-GENERATION (or something) that could be used to explicitly allocate objects in later generations. But this would be difficult if generation 0 works differently. Code objects are special as we've discussed before since they are both boxed and unboxed. I think maybe a single generation with a creation space may be sufficient here but maybe we need at least two generations. I'm sure you have a better idea of how often code objects get created (and how big they are) so you probably will have a better idea of what should happen here. I think we probably want at least two generations, with the oldest containing initial system code, and would probably never be flipped. For code objects, there might be some generations missing in the middle. Any code object that makes it out of generation-0 will probably live forever, so you could bump it way down. Perhaps the right thing to do is to create new code objects in the common first generation (as discussed before) so that all code objects have an initial chance to die off before being advanced. This might a big win since it seems to me that code objects probably have some sort of bimodal lifetime distribution -- either they have a very short or a long lifetime. That sounds good too. Ignoring the possibility of bizzaro self-agumenting programs, the only garbage code objects will be top-level forms, which have very short lives. Here's an idea for code objects that may be silly that's based on a variation on an entry table scheme. Since code objects are immutable (w/r/t the mutator) it should be possible to copy all of boxed pointers within them onto continguous pages (along with pointers back to the original pointers) and scavenge just those pages. Any pointers which are modified will be updated by following the back pointers. Further, by sorting the pointers according to where they point, GC would only have to scavenge a subset of them most of the time.. This should reduce paging considerably during generational scavenges. (What I'm particularly worried about is the cost of first generation scavenges.) The overhead of maintaining such a data strucutre would not be insigifnificant (so maybe only do it for the oldest generation and always scan all of the smaller younger generations). The size may be quite prohibitive though -- any ideas on how big? I think that through GC policy you could keep cross generation pointers in code objects to a minimum. e.g. when scavenging a code object, transport all objects into the same generation. I fear it is impossible to guarantee that there are *never* cross generation pointers in code, since the object might have already been transported (due to another reference) when we scavenge the code object. However, it would be easy for the scavenger to note all cross-generation pointers in code objects in a per-code-generation list. This could be in the format you describe above, but should stay fairly small. An advantage of explicitly tracking the (potential) cross-generation pointers in code is that it eliminates any need for the scavenger to understand how to scavenge code objects via card marking. It seems that this reduction in complexity offsets the complexity of maintaining a separate data structure to represent cross-generation pointers in code. Marking will initially be done with a Mach external pager but the unit of marking will be configurable so that we may utilize software card marking for certain types of objects if deemed useful. We may need some combination of recording, software car marking, and VM-assisted page marking. We will need some emperical evidence to decide conclusively. For now, stick with page marking since it is (relatively) easy. Yeh. Though I don't see much win for recording write addresses. I guess it's faster than software card marking, but if you do it often enough so that speed matters, then you will lose from the size of the write log. I suspect that more interesting than trying to use different strategies for different object types will be using different strategies for different systems, i.e. Hemlock might use s.w. card marking for quick GC's (low latency), whereas the compiler might use page marking. For this to work, all random system code will have to be compiled for card marking. Code that uses page marking would have to the run in a special dynamic environment (WITH-PAGE-MARKING ...), whatever. You would flush all the dirty pages at the beginning and end of this form, and for writes within, would mark all the cards on the page. Random Points: To increase the age threshold, it will be preferable to increase the size of a generation over splitting it into two generations assuming we have reasonable VM support and we don't pay too badly for having a big heap -- but only to a point -- since while you don't incur extra copying costs like you sometimes do when you split a generation, scavenging a bigger generation can take longer. There's some break even point here -- if you split the generations and scavenge the younger one with a schedule that kills off most of the data most of the time then you win since you don't pay the extra copying costs. My guess is that for throughput, you only need 4 generations: 0: flip often, fractional second GC time. 1: sized roughly by physical memory so that you don't thrash: 10's of seconds GC time. 2: any user-allocated objects that make it past 1. Flip only when in danger of running out of VM. Coffee break time. 3: Initial (static) objects. Never flip. For low GC latency, you might want more generations. Something interesting to think about is that the lifetime of some heap-allocated data are closely associated with function calls and returns. May be able to do something clever here. Something similar but completely different: might want to be clever and perfer to GC when control stack is near the top/bottom (depending on how you picture stacks). Conceptually a good idea, although it might be hard to come up with a good "close" heuristic. In any case, it would be a win to have a GC-PROCRASTINATE form that discouraged GC in its extent. Rob (Message blind:185) To: Rob.MacLachlan@FRED.SLISP.CS.CMU.EDU Reply-To: Christopher.Hoover@CS.CMU.EDU Subject: Re: GGC In-reply-to: Your message of Sun, 27 May 90 20:42:35 -0400. Date: Sun, 27 May 90 21:26:12 EDT Message-ID: <4330.643857972@LISP-PMAX1.SLISP.CS.CMU.EDU> From: Christopher.Hoover@LISP-PMAX1.SLISP.CS.CMU.EDU Rob, Thanks for the input -- at least I now know that I'm not totally out in left field. >> Allocation of certain types of objects will, of course, still be >> special cased. This includes big arrays, code objects, and other >> objects that generally don't die off quickly (hash tables, for >> example, ?what else?). >> >> I would be inclined to automatically tenure objects only when there is >> little alternative, i.e. really big arrays. Code can be specially >> allocated if it makes life simpler, but I don't see any compelling >> performance reason to do so. It would be nice to have some sort of >> allocation control macro ALLOCATE-IN-GENERATION (or something) that could >> be used to explicitly allocate objects in later generations. But this >> would be difficult if generation 0 works differently. I didn't understand what you were getting at at first but you're right. You can't simply switch the ALLOC register to the free pointer for generation x (x > 0) for the extent of the macro if generation 0 works differently. This is unfortunate. Perhaps the only way to deal with this is through a combination of special-cased allocation code and compiler magic. Any Lisp-based allocation code would have to check a special to decide which set of primitives to use -- primitives for generation 0 allocation or primitives for generation x (x > 0) allocation. Further, the compiler would have to use a different set of VOPS for all inline consing within the extent of such a form. Unfortunately there's still a problem -- the effect of the form w/r/t inline consing would be lexically scoped rather than dynamically scoped. (Is ``scope'' the right word to use here?) Perhaps this is good enough. >> My guess is that for throughput, you only need 4 generations: >> 0: flip often, fractional second GC time. >> 1: sized roughly by physical memory so that you don't thrash: 10's of >> seconds GC time. >> 2: any user-allocated objects that make it past 1. Flip only when in >> danger of running out of VM. Coffee break time. >> 3: Initial (static) objects. Never flip. Yeah, I forgot to say that I thought 4 generations was the right thing to do. I am glad that is your impression too. -- Chris.