General Design Notes FTN Token Server -- Mark II 16-dec-1999 / gmt 6-jul-2000 / gmt General Idea: a robust, distributed token service for use by Swarm filesystem clients clients communicate with servers to request and release tokens service survives multiple server crashes and arbitrary client crashes on a server crash, noncritical data (unassigned tokens) may be lost Tokens: identified (named) by an arbitrary C string reserved exclusively (r/w) or with sharing (read-only) are "sticky": remain held until explicitly released Token data: an arbitrary string of bytes associated with a token can be set by a client holding a token updating by shared holders is FCFS can be lost if a server runs out of memory or crashes Token distribution: hashing the token name seeds a pseudorandom number generator the pseudorandom sequence identifies the servers of the token the token is served by the first server that is not marked DOWN this algorithm equalizes distribution and minimizes reassignment Communication: clients and servers communicate using UDP unacknowledged requests are repeated heartbeat messages are used to detect crashed hosts Failure model: a crashed client or server fails to send heartbeat messages after a suitable timeout, it is declared "down" a host that stops sending heartbeats is not sending other messages if a client crashes, its tokens are implicitly released we assume no Byzantine (malicious) failures must guard against late messages from a host declared "down" Configuration: all servers share a same global initialization file this file lists all servers (e.g. host names and port numbers) clients are not prespecified Global state The token servers elect one of their members as a leader to maintain and distribute global state values. The leader broadcasts these values every so often in a message that also serves as its heartbeat. When no such heartbeats are seen, the servers elect a new leader. The global state values are: the identity of the leader the state of each server: DOWN, BOOTING, or READY the list of currently active clients Given the set of active (BOOTING or READY) servers, any client or server can unambiguously determine the server responsible for a particular token. The state of each token is maintained locally by its assigned server. Server states DOWN means that a server is not responsible for any tokens. BOOTING means that a server has responsibility but is still initializing. READY means that a server is serving a set of tokens. Changing the state of a server from DOWN to BOOTING to READY is done by that server as part of its initialization. Changing the state from READY to DOWN is initiated by the currently elected leader if a server's heartbeat messages cease. Server tasks After initialization, a server is event-driven. Events include: messages received from the leader periodic interrupts to check that a leader exists (to send it a heartbeat, or to start a new leader) messages received from clients timeouts awaiting client responses Server Initialization First, the server starts a background task that runs periodically. Each time, this task checks for a leader and sends it a heartbeat. If no leader is running, it initiates a leader election. The server then waits for a message from the leader. (A leader will soon be elected by the background process if one is not running.) The server responds to the leader's message with its own heartbeat, declaring itself to be BOOTING. This status is reflected in the leader's next broadcast message to all servers. Other servers see the BOOTING status and transfer the appropriate portions of their held tokens to the new server. The new server waits for messages from all the READY servers. These messages transfer the state of one portion of the token space, and also confirm the transfer of responsibility: the senders will honor no further requests for these tokens. The new server also sends the updated server list to every client and awaits responses listing tokens held by the clients. This is necessary: any client might hold a token obtained from another server that has since crashed. Queries are reissued periodically to each client until the client responds or is declared DOWN. Once the new server has received responses from all clients and all other servers, it is ready to go. It declares itself READY and begins processing requests. Receiving global state Every time a server receives a new global state message from the leader, it must check for state changes. If the present server is marked DOWN, presumably due to a communication failure, its authority for token serving has been redistributed among the surviving servers. It can reinitialize from the beginning with a clean slate, but must forget all current data. It must NOT declare itself BOOTING or READY without reinitializing. If another server is marked BOOTING, the present server stops serving any tokens that will become the other server's responsibility, and sends it a message transferring those tokens. This may repeat for multiple rounds in the case of message loss. If another server is newly marked DOWN, the present server must reclaim a portion of the dead server's tokens and begin serving them. The survivor queries all clients (similar to the initialization step) before serving any tokens previously belonging to the crashed server. The server's query messages also inform the clients to stop listening to the crashed server, should it later recover and send leftover messages. If a client has disappeared from the client list, and that client holds tokens maintained by this server, then those tokens should be marked as free. The Client List A list of current clients is maintained as part of the global state. Before a client can request tokens, it must "register" by contacting the leader. (Actually, it contacts *any* server, which tells it in response which server is the leader.) The leader generates a unique ID and adds the client to the client list. If a client tells the leader that it is closing down, the leader removes the client from the list. A client is also removed from the list if the leader fails to receive any heartbeat messages for some specified interval. Once a client is removed from the client list, its ID is no longer valid and it can perform no more token actions. When a Server Crashes A server crash is detected by the leader when it notices that the server has stopped sending heartbeat messages. It then declares the server DOWN, initiating the recovery process by the surviving servers. Each server queries each client regarding any tokens held by the client that transfer to the particular server. After this process is complete, the surviving servers have assumed the responsibility for all the tokens formerly served by the one that crashed. If the leader crashes, the other servers notice this and elect a new leader. This new leader then notes the lack of a heartbeat from the old leader and the server crash is handled as described above. A client that attempts to communicate with a crashed server receives no response and just keeps resending its request to no avail. Eventually, the client receives a query from another server that informs it of the new server configuration. It recalculates the destination for all its outstanding requests and redirects them to their new addresses. After the servers complete the recovery process, they begin responding again to client requests. If a server is falsely declared DOWN, perhaps due to a communications failure, it may still send some client messages before noticing this fact. Any late messages that are received by the client before it is informed of the server "crash" will be honored; but their effects will be reflected in the client response to the subsequent recovery query and so will be accounted for as part of the recovery process. Late messages that come after the query are discarded by the client and so pose no danger. When a Client Crashes A client crash is detected by the leader when an interval passes without receiving any heartbeat messages. The leader removes the client from the list and distributes this information as part of the global state. Each server receiving the revised list notes that the client is gone and reclaims any tokens that were assigned to the client. If a client declared down has not truly crashed, it may continue to take actions outside of the token realm based on its misapprehension that it is still a participant and that it still holds some tokens. This is unavoidable, but it highlights the importance of conservative criteria for declaring a client crash. Once the client makes any token calls, it receives an error reply, because its ID is no longer valid.