Lecture 8
Review
TS instruction
spin lock solution using TS
properties
today: making TS efficient; making spin locks fair
Multiprocessors and Caches (see Sec. 1.2.2)
block diagram of a small SMP (Figure 1.2)
cache: a small fast memory
contains a subset of primary memory
exploits the principle of locality
read a word: look in cache
if not there, read from memory into cache and CPU
write a word: put in cache and/or put in memory
read/write hits takes 1 clock; misses take lots more, e.g., 50 clocks
cache coherence problem
when write, need to invalidate old copies
when read, need to get current copy
this takes time (e.g., 50 clocks)
other issues (just mention now; we'll learn more later)
multiword cache lines
problem of false sharing
summary: spin locks are useful on SMPs, but there are
"hidden" performance costs with reading and writing shared variables
Performance of Test and Set (Section 3.2.2)
TS reads AND writes a lock
best case (no contention), i.e. lock is free, 1 process wants in)
read lock (50 clocks)
write lock (50 clocks)
execute CS
write lock (1 or 50 clocks)
repeated usage by the same process get cheap reads
worst case -- n processes all trying to get into their CS
1 process does read and write and succeeds (100 clocks)
other n-1 processes do read, write, fail, repeat
hence, the bus is jammed AND the first process might get delayed
when it wants to release the lock
Test and Test and Set (Section 3.2.2)
CSenter: while (lock) skip; # test
while (TS(lock) # test and set
while(lock) skip; # test again
CSexit: lock = false;
one extra clock in best case; no write (or bus use) while spinning
Implementing Await Statements (Section 3.2.3)
we can use a spin lock solution to implement any kind of await
statement and hence any kind of atomic action
<< S; >> CSenter; S; CSexit;
<< await(B) S; >> CSenter;
while (!B) { CSexit; Delay; CSenter; }
S;
CSexit;
for Delay, use recheck idea from earlier --- i.e., spin until B is true
Fair Solutions to the CS Problem (Section 3.3)
[I give short treatment to this; it is covered in our undergrad OS class]
need a fair way to break ties
overview of common approaches (all are in the text):
tiebreaker algorithm (aka Peterson's algorithm) -- undergrad OS class
simple for 2 processes; complex for n
[NB: I made a major goof in the code for these in the first printing;
see the errata sheet.]
ticket algorithm -- covered here later
easy, but needs special instruction
bakery algorithm -- also covered in undergrad OS class
ticket-like w/o special instruction
Ticket Algorithm
shared: int number = 1, next = 1;
CSenter: int myturn; # private variable; one copy per process
<< myturn = number; number++; >>
<< await(myturn == next); >>
CSexit: << next++ >> # different variable, not a spin lock
properties: mutual exclusion
no livelock or unnecessary delay
eventual entry
implementation: spin is no problem. why?
exit is no problem. why?
drawing a ticket is a problem. why?
Fetch and Add Instruction
read and increment a variable as a single atomic action:
int FA(var, incr) {
<< int tmp = var; var += incr; return (tmp); >>
ticket drawing is then simply myturn = FA(number, 1);
performance: spin on cached value (next) and fair,
but hardware has to provide an FA or similar instruction