Lecture 10
Review
questions on Pthreads? on Homework 2?
bag of tasks -- the primes problem
get task -- want to avoid contention for the bag
results -- need sorted list; be careful about contention
there is also a subtle synchronization problem
(need to be sure there are no "holes" in the list of known primes)
advice: use arrays rather than lists
think about load balancing; don't necessarily need just one bag
make it work, then make it faster: correctness then speed
Barrier Synchronization (Section 3.4)
a BARRIER is a point that all processes must reach before any proceed
very common in iterative parallelism
examples:
(1) "co inside while" style of parallelism
while () {
co ... oc # "oc" is essentially a barrier, but an expensive one
}
(2) initialize
... # barrier often needed to ensure initialization done
compute
(3) data parallel algorithms -- Section 3.6; we'll look at this next time
(4) scientific computing -- Chapter 11; we'll look at this in Lecture 12
Counter Barrier -- for n processes (Section 3.4.1)
int count = 0;
Barrier: << count++ >> # record arrival
<< await (count==n); >> # wait for everyone to arrive
implementation: increment -- use FA or critical section
delay loop -- use spin loop
problems:
(1) contention -- single shared counter
(2) cannot be reused, but barriers are usually used inside loops
why is this a problem? how do we reset count?
solving the reuse problem
try counting up then counting down (called reverse sense)
odd barriers: << count++ >>
<< await(count==n) >>
even barriers: << count-- >>
<< await(count==0) >>
this still doesn't work. why?
use TWO counters AND reverse their senses
up1, up2, down1, down2, repeat
why does this work? key is to have at least 3 stages
Coordinator Barrier -- using flags (Section 3.4.2)
idea: distribute the single counter above (a time/space tradeoff)
diagram of interaction
Coordinator
Worker1 WorkerN
arrows from Workers to Coordinator and from Coordinator to Workers
each arrow represents a signal from one process to another
represent each signal by a flag variable
shared variables: int arrive[1:n] = ([n] 0);
go[1:n] = ([n] 0);
the basic signaling scheme is then implemented as follows:
Worker[i]: arrive[i] = 1; # announce arrival
<< await(go[i]==1); # wait for permission to go
... # leave space for later (see below)
Coordinator:
# wait for all workers to arrive
for[i = 1 to n] {
<< await(arrive[i]==1); >>
... # leave space for later (see below)
}
# tell all workers they can go on
for [i = 1 to n]
go[i] = 1;
what about the reset problem?
solve by clearing flags at the ... points above
be sure to follow the Flag Synchronization Principles (3.14). why?
why 2n flags? would n+1 be enough? [no]
can we make do with n flags [yes; reverse sense; avoids reset]
what about contention? [not a problem; separate flags; spin on cached values]
what about total time in best case (all workers arrive at once)
time is O(n) because of the loops in the coordinator
we also need a separate coordinator process (although one of
the workers could serve as the coordinator)
Combining Tree Barriers (end of Section 3.4.2)
briefly sketched the idea and the structure (see Figure 3.13)
signaling is more complex that for a coordinator, but time is O(log n)
[note: a tree is often used in distributed programs, especially if
the communications network lets messages be sent in parallel along
different paths.]