CSc 520 - Principles of Programming Languages
9 : Memory Management -- GC -- Copying Collection

Christian Collberg

Department of Computer Science

University of Arizona

1 Copying Collection

Even if most of the heapspace is garbage, a mark and sweep algorithm will touch the entire heap. In such cases it would be better if the algorithm only touched the live objects.
Copying collection is such an algorithm. The basic idea is:
1. The heap is divided into two spaces, the from-space and the to-space.
2. We start out by allocating objects in the from-space.
3. When from-space is full, all live objects are copied from from-space to to-space.
4. We then continue allocating in to-space until it fills up, and a new GC starts.

An important side-effect of copying collection is that we get automatic compaction - after a collection to-space consists of the live objects in a contiguous piece of memory, followed by the free space.
This sounds really easy, but 1#1:
- We have to traverse the object graph (just like in mark and sweep), and so we need to decide the order in which this should be done, depth-first or breadth-first.
- DFS requires a stack (but we can, of course, use pointer reversal just as with mark and sweep), and BFS a queue. We will see later that encoding a queue is very simple, and hence most implementations of copying collection make use of BFS.

This sounds really easy, but 1#1
- An object in from-space will generally have several objects pointing to it. So, when an object is moved from from-space to to-space we have to make sure that we change the pointers to point to the new copy.

Mark-and-sweep touches the entire heap, even if most of it is garbage. Copying collection only touches live cells.
Copying collection divides the heap in two parts: from-space and to-space.
to-space is automatically compacted.
How to traverse object graph: BFS or DFS?
How to update pointers to moved objects?

Algorithm:

Traversing the Object Graph:

Updating (Forwarding) Pointers:

Example:

CopyingCollection0

scan := next := ADDR(to-space)
- 2#2 hold the BFS queue.
- Objects above scan point into to-space. Objects between scan and next point into from-space.
Copy objects pointed to by the root pointers to to-space.
Update the root pointers to point to to-space.
Put each object's new address first in the original.
Repeat (recursively) with all the pointers in the new to-space.
1. Update scan to point past the last processed node.
2. Update next to point past the last copied node.
Continue while scan < next.

CopyingCollection1

CopyingCollection2

Cost of Garbage Collection

The size of the heap is 3#3, the amount of reachable memory is 4#4, the amount of memory reclaimed is 5#5.

GCCost1

6#6

GCCost-Copy

The breadth first search phase touches all live nodes. Hence, it takes time 7#7, for some constant 8#8. 9#9?
The heap is divided into a from-space and a to-space, so each collection reclaims 10#10 words.

11#11

GCCost-Copy-FewLive

If there are few live objects (12#12) the GC cost is low.
If 13#13, we get

14#14

This is expensive: 4 times as much memory as reachable data, 10 instruction GC cost per object allocated.

Christian Collberg 2008-02-11