Lecture 11
Review of Barriers
all processes must arrive before any leave
applications to come
flags: one per edge in the "signaling graph"
kinds of barriers so far:
counter -- symmetric, but reset problem and O(n)
coordinator -- simple, but asymmetric and O(n)
tree -- O(log n), but asymmetric and harder to program
today: efficient, symmetric barriers
Symmetric Barriers (Sec 3.4.3)
basic building block for two processes:
Worker1 <--> Worker2 # signal each other
shared vars: int ar[n] = ([n] 0);
go[n] = ([n] 0);
Worker[i]: ...
ar[i] = 1;
<< await(ar[j]==1); >>
Worker[j]: ...
ar[j] = 1;
<< await([ar[i]==1); >>
what about reset?
remember the flag synchronization principles: waiter clears; don't set until clear
add code for the ... above. clear flag at end. await at front to make
sure "my" flag is clear. [See (3.15) on p. 121 of the text.]
performance in the best case (all arrive at the same time):
both workers move through TOGETHER, setting, checking, and clearing flags
Butterfly Barrier -- log[[2]] n stages of 2 process barriers
idea is to replicate work: each worker "barriers" with log n others
interaction diagram -- see Figure 3.15
reuse: use multiple flags (arrays) or better yet, use stage counters
as shown on p. 123
performance: processes move through together
but watch out for FALSE SHARING
caches use blocks (lines) that often contain more than one word
arrays of flags will get packed together
a write into one flag will invalidate an entire cache line,
and this leads to invalidates in OTHER caches
the solution is to use padding (blank space between flags)
another time/space tradeoff
Dissemination Barrier
a different way to connect the processes
simpler to program and works for any value of n
show connection diagram -- see Figure 3.16
easiest to program if you set another process's flag and wait for
your own flag
again use incrementing stage counters to avoid reset problem
Parallel Computing (again!)
task parallelism -- processes run on own; execute asynchronously
data parallelism -- processes do same thing (on different parts of data)
execute synchronously, in lock step
languages (synchronous semantics): HPF, ZPL, NESL (see Chapter 12)
machines (synchronous execution): Illiac, CM, MasPar
[no commercial offerings today; special purpose]
Data Parallel Algorithms (Section 3.5)
I gave a pretty brief (and rushed) introduction to this
covered the parallel prefix program in Section 3.5.1
first developed the synchronous algorithm on page 131,
then presented the asynchronous equivalent (using barriers) in Figure 3.17
briefly mentioned list algorithms in Section 3.5.2;
we'll see larger examples from Chapter 11 next time