Lecture 9
Big Picture (review/preview)
processes (threads) -- independent tasks
communication (for now) -- shared variables
synchronization -- critical sections; conditions
multithreaded program -- threads take turns
parallel program -- usually one thread/processor
goal is speedup (see intro to Part 3 of text)
T1 -- time for a SEQUENTIAL program on 1 processor
Tp -- time for a parallel program on p processors
speedup = T1 / Tp
linear; sublinear; superlinear
mention that superlinear happens, usually due to cache effects
impediments to speedup
inherently sequential parts
load imbalance
synchronization overhead: critical sections, delays, fork/join
Review of Two Major Parallel Programming Styles (from Chapter 1)
iterative -- matrix multiplication C = A x B, for n x n matrices
suppose p << n and p is a factor of n (p = number of processors)
assign each process a strip of n/p row of C (see p. 16)
describe overheads
this is an example of an EMBARRASSINGLY PARALLEL application
recursive -- adaptive quadrature
parallel recursive calls
describe overheads -- these are very large in general
can limit recursion depth, but still a challenge
Bag of Tasks Paradigm (Section 3.6)
P workers that share a bag of tasks
useful for independent tasks and to implement recursive parallelism
provides load balancing pretty much automatically
shared: variables and locks for bag
process Worker[w = 1 to P] {
while (true) {
get a task from the bag;
do it, possibly generating new tasks and putting them in the bag;
}
}
challenge: detecting termination
termination when bag is empty AND all tasks are done
all tasks are done when all workers are waiting to get a new task
Examples
(1) matrix multiplication by rows -- Figure 3.20
shared bag: int nextRow = 0;
get a task: << row = nextRow; nextRow++; >>
(implement the atomic action using a CS solution, such as sems)
worker code is simple -- don't need to program strips or such
also easy to make tasks larger or smaller
(2) adaptive quadrature -- Figure 3.21
shared bag: records of the form (a, b, f(a), f(b), area)
get a task: remove a record from the bag
also produce new tasks (by adding them to the bag) whenever you
would have done recursion in the parallel recursive program
for efficiency, would want to do "pruning" when there are enough
tasks; for example, keep track of how many there are, and quit
generating tasks when there are enough. Instead, just use the
basic recursive algorithm at this point.
what is enough tasks: heuristic is 2-3 times the number of workers
this spreads out the load and should be pretty balanced
Assign Homework 2
get started; come with questions next time
problem 3 is on barriers, which we will cover starting next time
Pthreads POSIX threads (Section 4.6)
overview of what it is
available on lots of machines
handouts for sample programs -- go over transparencies
simple.c, shows basic structure
pc.busy.c, producer/consumer using busy waiting
pc.sems.c, producer/consumer using semaphores
clock.c, shows how to do timing with pc.sems.c
(Note: These programs work on our Sun/Solaris machines.)