Lecture 25
Review -- Distributed Parallel Programming
exchange values problem
coordinator -- manager/workers
exchange with all (symmetric) -- heartbeat
ring -- pipeline
distributed parallel computing paradigms:
saw manager/workers last time
will look at heartbeat and pipeline today
Heartbeat Algorithms
what: divide work (evenly)
Worker[i]:: while(not done) {
compute
exchange values with neighbors (send .. receive)
}
applications: Section 9.2 -- image processing, cellular automata
Chapter 11 -- grid, particle, and matrix computations
Jacobi iteration -- recall the problem
draw rectangular grid and some of the points
new values of points are average of previous values of
the four neighbors
compute from old -> new and then swap, or
unroll loop once and compute from old -> new then new -> old
divide into strips (or blocks)
Worker[i]: while (...) {
exchange edges
compute new values
exchange edges
compute old values
}
the exchanges provide a "fuzzy" barrier
better code: overlap communication and computation
send my edges to neighbors; compute interior
receive other's edges; compute my new edges
Pipeline Algorithms
what: divide work evenly
compute and circulate data among workers
pipeline structures (Figure 9.5) -- circular or closed (or open)
when: used when workers need all the data, not just edges from neighbors
applications: matrix multiplication -- Section 9.3
nbody problem -- Section 11.2 (more below)
a different kind of pipeline -- wavefronts
consider Gauss-Seidel iteration (Sections 11.1 and 12.2)
"raster scan" of grid -- hence can update in place
picture of update order (Figure 12.6)
data dependencies
idea of loop skewing (Figure 12.6)
this results in wave front parallelism, which can be implemented
by a pipeline
show how to do this by using column strips and having one
worker per strip
after a worker updates a row of its strip, it sends the row to the
next worker to its right
Distributed Algorithms for the N-body Problem -- Section 11.2
recall the problem: calculate forces
barrier
move bodies
barrier
then repeat
distributed programs (for the n**2 algorithm):
the challenges are to divide up the work and to have all the data you need
(I outlined the following three approaches; details are in the text.)
manager/workers paradigm
tasks in bag are pairs of blocks of bodies; e.g.,
(1,1), (1,2), (1,3), (2,2), (2,3), (3,3)
each worker needs data on all bodies
workers get tasks, compute forces between all bodies in that set
workers exchange info after calculating forces
workers also exchange bodies after moving them
heartbeat paradigm
use uneven size blocks
algorithm (for each worker)
send bodies to lower numbered workers
calculate local forces
receive bodies, compute, send back (from/to higher numbered workers)
receive bodies and add forces in (from lower numbered workers)
move local bodies
pipeline paradigm
assign bodies by stripes (or reverse stripes), not by strips or blocks
algorithm for each worker:
send my bodies along
compute local forces
receive new bodies
compute forces I'm responsible for
send on those bodies and forces on them
receive my bodies back and add in forces on them from
lower-numbered bodies
tradeoffs -- see Table 11.1
these algorithms pass different numbers of messages of different sizes