[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

(fwd) Re: Ada GC and a bunch of other stuff



Hi folks,

This caught my eye, and I thought it would be of interest. I'll see if I
can turn up anything else on the subject. As far as I can tell an ETH-style
mark-and-sweep collector such as the one I use is what Chase is calling a
'precise' collector. I've put a few comments in square brackets in the body
of the article.

Frank

From: chase@centerline.com (David Chase)
Newsgroups: comp.compilers
Subject: Re: Ada GC and a bunch of other stuff
Date: 9 Feb 1996 14:26:13 -0500
Message-ID: <96-02-073@comp.compilers>

hbaker@netcom.com (Henry Baker) writes:
> Someone mentioned the issue of the interaction of GC with compiler
> optimizations.  David Chase has probably studied this more (at least
> in the last several years) than most everyone else, and has published
> papers on this (ACM PLDI, 1988??).

Thanks for the plug, but you should also take note of the work done at
U Mass by Eliot Moss, Rick Hudson, Amer Diwan, and probably others
whose names I don't know offhand.  Their goal is a Modula-3 system
that does both the "usual optimizations" and precise pointer-tracking
as part of a persistent object system (I'm paraphrasing -- I'm most
familiar with the subgoal of pointer tracking in a compiler-optimized
system, and not so familiar with their main goal).  It's not a pretty
problem.

Hans Boehm has also collected a few good examples of how things can go
wrong -- we've been discussing the problem for years now, and trying
to come up with "cheap" ways of curing the symptoms and/or the
disease, or convincing ourselves that it isn't worth worrying about in
practice.

The PLDI '88 paper says little about interaction between optimization
and garbage collection.  All that discussion remains, unfortunately,
still a chapter in my dissertation (perhaps the only chapter still
worth reading).

And, for the sake of completeness, here are a few of the examples.
This is basically a collection of disasters that would happen if you
took a (really good) C/Fortran optimizer and combined it with an
off-the-shelf garbage collector, and had a very unlucky day.  In
practice, current compilers and the conservative collectors have never
been observed to trigger these bugs "in the field", though I managed
to make it happen in 1988 "in the lab".  There's a sort of
mix-and-match flavor to them -- for instance, dealing with GC and
optimization in a single-threaded non-compacting system is much saner
than it is in a multi-threaded, compacting system, and it is worse yet
in a concurrently compacting system.

1. Basic pointer arithmetic optimizations.  Any decent optimizer is
likely to take expressions like A[i+j] and form some sort of
addressing subexpression involving the address of the array A and i,
j, or both.  The resulting subexpression could point *anywhere*, since
there are no constraints (generally) on the values in i and j,
considered separately (the sum is constrained, of course).

More creative/aggressive optimizers might form the *difference*
between two pointers.  This is "really awful" from the POV of a
garbage collector.

  [ I think this is applicable to a collector that searches for pointers in
    CPU registers. I don't think this is any more of a problem than
    searching the stack. The ETH collectors do this, and apply a number of
    sanity checks to make sure that they have indeed found a valid
    pointer. ]

2. Initialization.  If the collector is "precise", meaning that it is
designed with the assumption that it will find all pointers, then you
have to be careful with newly allocated memory/variables.  If it
contains junk, this could cause problems for the collector.  Note that
an optimizer might notice that the "initialization" of a variable is
later overwritten, and remove the initialization -- after all, it
isn't used, is it?

  [ The collector I use assumes all traceable pointer variables are
    initialised to NIL, and Oberon-A generates code to automatically zero
    global and local variables. This can be switched off, in which case it
    is up to the programmer to ensure that pointer variables are
    initialised before the GC is called. If the OOC optimiser does remove
    initialisations, then this could have serious consequences. ]

3. Creative code generation.  (A) On modern machines, it is often
faster to use the floating point register file if you have to move
lots of data from point A to B.  Don't forget to look for pointers in
the floating pointer registers.  (B) On some machines (PowerPC, e.g.)
the instruction set was designed so that offsets in a particular range
are formed by first adding a big number, then subtracting a small
number, as in ((P+65536)-1000).  That first intermediate expression
might lie outside the bounds of the object referenced by P.  (Hans
Boehm found this one) This also complicates life for table-based
schemes (briefly alluded to in my dissertation, actually figured out
and put to proper use by the gang at U Mass), since it means that the
code generator must be clued in to the table generation games (or
else, it should use different, perhaps less efficient, idioms).

  [ I'm not sure what (B) is referring to, but if it means that CPU
    registers might contain invalid pointers then I think that's covered
    (see my note for example 1). ]

4. Undoing the programmer's good intentions.  Suppose that you,
Mr./Ms.  Programmer, notices that you have a reference in a
non-longer-useful variable, as you are about to make a truly recursive
call, as in:

  f(p) {
    q = <something allocating memory>
    ... q ...
    f( r );
    <no further occurrences of q>
    }

This could lead to unnecessarily retained storage.  But, we know what
to do here:

  f(p) {
    q = <something allocating memory>
    ... q ...
    q := NIL; /* Stomp on last reference to that memory, so
                 it will be collected. */
    f( r );
    <no further occurrences of q>
    }

Too bad for you that the dead code eliminator got rid of that inserted
assignment to q, isn't it?  (I hate it when that happens.)

  [ This is something that will have to be watched. Assigning NIL to a
    pointer can have implications beyond the immediate block of code being
    optimised. This is especially true if code following the assignment
    directly or indirectly triggers the garbage collector. ]

5. Something really vile involving different values assigned to the
same variable on converging flow paths (this was related to me by
Eliot Moss) -- this has a bad interaction with precise pointer tracking.

  [ No idea what this is referring to. ]

Preston Briggs and I had a wonderful idea over beer a few years ago
that is too small to fit into the margin here, and that neither of us
has had the time to properly follow up.

On the other hand, I think it is really wonderful to see that people
are finally talking seriously about GC in programming languages.  My
gut feeling is that most of the perceived "problems" with GC in
various contexts can be solved through the addition of More Memory
(for instance, treadmill-style collection requires a few header words
that mark-and-sweep does not).

speaking for myself,

David Chase
--
Send compilers articles to compilers@iecc.com,
meta-mail to compilers-request@iecc.com.