The FTN Token Service

Gregg Townsend
The University of Arizona
July 3, 2001


The FTN Token Service is a distributed application providing reliable
locking combined with unreliable data storage.  Each token functions as
a reliable lock carrying a small amount of data.  Replication costs are
eliminated by accepting the possibility of occasional data loss.


Tokens
   -- are specified by an arbitrary name
   -- function reliably as locks
   -- can be requested exclusively or not
   -- can carry a small data payload

The data payload
   -- should be treated as a noncritical "hint"
   -- can be lost when a server crashes
   -- could also be deliberately discarded (e.g. by aging)


Components
   -- the token service runs on a set of cooperating servers
   -- the severs elect one of their own as the leader
   -- clients interact with servers to request and release tokens
   -- clients are trusted to behave


The leader
   -- is elected dynamically by the servers
   -- maintains an authoritative list of functioning clients and servers
   -- broadcasts it periodically to other servers
   -- is known to be alive because these broadcasts act as a heartbeat

The N servers
   -- each handle about 1/N of the tokens
   -- maintain assignments and data values
   -- interact with all clients
   -- cooperate with other servers in the event of reconfiguration


The client interface
   -- is provided by both C and Java libraries
   -- is thread-safe
   -- supports requesting, updating, and releasing tokens
   -- performs a callback when another client needs a held token

The library hides the details:
   -- initialization handshaking
   -- network hiccups and server reconfigurations
   -- configuration awareness (knowing which server to contact)


The failure model:
   -- the network is imperfect (may drop / delay / duplicate packets)
   -- servers may crash occasionally; can afford a few seconds to recover
   -- clients may join, leave, or crash unpredictably
   -- nodes hang or crash "hard" (they do not fail by sending bogus data)
   -- any node that is still sending heartbeats is still functioning
   -- both servers and clients are assumed to be trustworthy

When a server crashes:
   -- the leader notifies all servers
   -- the servers query the clients about their holdings
   -- remaining servers share the work of the crashed server
   -- no other tokens are reassigned
   -- token data held on the server is lost

When the leader crashes:
   -- the remaining servers elect a new leader
   -- then the server crash is handled

When a client crashes:
   -- the leader notifies all servers
   -- they release tokens held by that client
   -- they ignore future requests from that client
   -- if the client recovers it must reregister as a new client