The FTN Token Service Gregg Townsend The University of Arizona July 3, 2001 The FTN Token Service is a distributed application providing reliable locking combined with unreliable data storage. Each token functions as a reliable lock carrying a small amount of data. Replication costs are eliminated by accepting the possibility of occasional data loss. Tokens -- are specified by an arbitrary name -- function reliably as locks -- can be requested exclusively or not -- can carry a small data payload The data payload -- should be treated as a noncritical "hint" -- can be lost when a server crashes -- could also be deliberately discarded (e.g. by aging) Components -- the token service runs on a set of cooperating servers -- the severs elect one of their own as the leader -- clients interact with servers to request and release tokens -- clients are trusted to behave The leader -- is elected dynamically by the servers -- maintains an authoritative list of functioning clients and servers -- broadcasts it periodically to other servers -- is known to be alive because these broadcasts act as a heartbeat The N servers -- each handle about 1/N of the tokens -- maintain assignments and data values -- interact with all clients -- cooperate with other servers in the event of reconfiguration The client interface -- is provided by both C and Java libraries -- is thread-safe -- supports requesting, updating, and releasing tokens -- performs a callback when another client needs a held token The library hides the details: -- initialization handshaking -- network hiccups and server reconfigurations -- configuration awareness (knowing which server to contact) The failure model: -- the network is imperfect (may drop / delay / duplicate packets) -- servers may crash occasionally; can afford a few seconds to recover -- clients may join, leave, or crash unpredictably -- nodes hang or crash "hard" (they do not fail by sending bogus data) -- any node that is still sending heartbeats is still functioning -- both servers and clients are assumed to be trustworthy When a server crashes: -- the leader notifies all servers -- the servers query the clients about their holdings -- remaining servers share the work of the crashed server -- no other tokens are reassigned -- token data held on the server is lost When the leader crashes: -- the remaining servers elect a new leader -- then the server crash is handled When a client crashes: -- the leader notifies all servers -- they release tokens held by that client -- they ignore future requests from that client -- if the client recovers it must reregister as a new client