Lecture 8

Review

   TS instruction
   spin lock solution using TS
   properties

   today:  making TS efficient; making spin locks fair


Multiprocessors and Caches (see Sec. 1.2.2)

   block diagram of a small SMP (Figure 1.2)

   cache:  a small fast memory
           contains a subset of primary memory
           exploits the principle of locality

   read a word:  look in cache
                 if not there, read from memory into cache and CPU

   write a word:  put in cache and/or put in memory

   read/write hits takes 1 clock; misses take lots more, e.g., 50 clocks

   cache coherence problem
      when write, need to invalidate old copies
      when read, need to get current copy
      this takes time (e.g., 50 clocks)

   other issues (just mention now; we'll learn more later)
      multiword cache lines
      problem of false sharing

   summary:  spin locks are useful on SMPs, but there are
      "hidden" performance costs with reading and writing shared variables


Performance of Test and Set (Section 3.2.2)

   TS reads AND writes a lock

   best case (no contention), i.e. lock is free, 1 process wants in)
      read lock (50 clocks)
      write lock (50 clocks)
      execute CS
      write lock (1 or 50 clocks)

      repeated usage by the same process get cheap reads

   worst case -- n processes all trying to get into their CS

      1 process does read and write and succeeds (100 clocks)
      other n-1 processes do read, write, fail, repeat

         hence, the bus is jammed AND the first process might get delayed
         when it wants to release the lock


Test and Test and Set (Section 3.2.2)

   CSenter:  while (lock) skip;     # test
             while (TS(lock)        # test and set
                while(lock) skip;   # test again

   CSexit:   lock = false;

   one extra clock in best case; no write (or bus use) while spinning


Implementing Await Statements (Section 3.2.3)

   we can use a spin lock solution to implement any kind of await
   statement and hence any kind of atomic action

   << S; >>    CSenter; S; CSexit;

   << await(B) S; >>  CSenter;
                      while (!B) { CSexit; Delay; CSenter; }
                      S;
                      CSexit;

   for Delay, use recheck idea from earlier --- i.e., spin until B is true


Fair Solutions to the CS Problem (Section 3.3)

   [I give short treatment to this; it is covered in our undergrad OS class]

   need a fair way to break ties
   overview of common approaches (all are in the text):

      tiebreaker algorithm (aka Peterson's algorithm) -- undergrad OS class
        simple for 2 processes; complex for n

        [NB:  I made a major goof in the code for these in the first printing;
              see the errata sheet.]

      ticket algorithm -- covered here later
        easy, but needs special instruction

      bakery algorithm -- also covered in undergrad OS class
        ticket-like w/o special instruction


Ticket Algorithm

   shared:  int number = 1, next = 1;

   CSenter:  int myturn;   # private variable; one copy per process

             << myturn = number; number++; >>
             << await(myturn == next); >>

   CSexit:   << next++ >>   # different variable, not a spin lock

   properties:  mutual exclusion
                no livelock or unnecessary delay
                eventual entry

   implementation:  spin is no problem.  why?
                    exit is no problem.  why?
                    drawing a ticket is a problem.  why?


Fetch and Add Instruction

   read and increment a variable as a single atomic action:

      int FA(var, incr) {
         << int tmp = var; var += incr; return (tmp); >>

   ticket drawing is then simply  myturn = FA(number, 1);

   performance:  spin on cached value (next) and fair,
                 but hardware has to provide an FA or similar instruction