January 2019, rev. March 2019, rev. April 2019


"I'm not sure if garbage collection is the right idea."
— Guy Steele, May 2001


I've yet to see explained in one sentence why garbage collection is necessary but I've seen how it can be avoided.

There are two ways to avoid GC: to copy, or to use one reference only. Copying avoids setting a reference from one object to another.

If (= a (hash 1 2)) and (= b (hash 3 a)), b can copy a to avoid GC.

This was the simple case. Now is it better to always copy, or are there times when it's better to use one reference only?

One trouble with copies shows when setting the value of an object that is deeply nested in another. If there's no reference the new value is set on a copy, and so the new value must be copied back into the outermost object. [1]

Set a value in four nested hash tables to see how copies can cause trouble: (= ((((h a) b) c) d) 7). If there are no references and everything is a copy, setting the value of the innermost object (((h a) b) c) for the key d to 7 isn't enough because (h a) doesn't return a reference to an object but a copy. The copy must be set back in h for the key a.

There's an optimization here to make fewer copies. After (h a) is copied, the rest of the changes can happen in place on this copy. There's no need to make an extra copy of ((h a) b) or an extra copy of (((h a) b) c) and to save those back too. The rest of the changes are applied by a single thread on an exclusive copy. It's ok for nested, successively inner objects to use a reference to each other and to the first copy.

So the generalization to avoid GC for (h a) is:
  1. when h is the global environment, the outermost object (h a) must be copied and this copy must be saved back. Nested, successively inner objects of (h a) can use a reference to each other but not to (h a).

  2. when h is declared on the stack (in a function, and with return values deep copied), it's ok to use a reference to h and make changes in place without a copy.

  3. when h is the global environment and a is big do what databases do: use MVCC with reference counting to avoid full copying. Since it applies to globals it doesn't need cycle detection, and so this is lighter than ordinary ref-counting GC.
If you're looking to implement setting values in nested objects here's sample source code that shows the recursion.

Before claiming GC is the future, or the opposing view that GC can be avoided, consider the prospect of balance. Good design is balanced.









Notes:

[1]  In ((((h a) b) c) d):
  • the innermost object is ((((h a) b) c) d)

  • the outermost object is (h a).