General Design Notes
FTN Token Server -- Mark II

16-dec-1999 / gmt
 6-jul-2000 / gmt
 

General Idea:
    a robust, distributed token service for use by Swarm filesystem clients
    clients communicate with servers to request and release tokens
    service survives multiple server crashes and arbitrary client crashes
    on a server crash, noncritical data (unassigned tokens) may be lost


Tokens:
    identified (named) by an arbitrary C string
    reserved exclusively (r/w) or with sharing (read-only)
    are "sticky": remain held until explicitly released

Token data:
    an arbitrary string of bytes associated with a token
    can be set by a client holding a token
        updating by shared holders is FCFS
    can be lost if a server runs out of memory or crashes

Token distribution:
    hashing the token name seeds a pseudorandom number generator
    the pseudorandom sequence identifies the servers of the token
    the token is served by the first server that is not marked DOWN
    this algorithm equalizes distribution and minimizes reassignment


Communication:
    clients and servers communicate using UDP
    unacknowledged requests are repeated
    heartbeat messages are used to detect crashed hosts

Failure model:
    a crashed client or server fails to send heartbeat messages
    after a suitable timeout, it is declared "down"
    a host that stops sending heartbeats is not sending other messages

    if a client crashes, its tokens are implicitly released
    we assume no Byzantine (malicious) failures
    must guard against late messages from a host declared "down"

Configuration:
    all servers share a same global initialization file
    this file lists all servers (e.g. host names and port numbers)
    clients are not prespecified


Global state

The token servers elect one of their members as a leader to maintain
and distribute global state values.  The leader broadcasts these values
every so often in a message that also serves as its heartbeat.  When 
no such heartbeats are seen, the servers elect a new leader.

The global state values are:
    the identity of the leader
    the state of each server: DOWN, BOOTING, or READY
    the list of currently active clients

Given the set of active (BOOTING or READY) servers, any client or server
can unambiguously determine the server responsible for a particular token.
The state of each token is maintained locally by its assigned server.


Server states

DOWN means that a server is not responsible for any tokens.  BOOTING
means that a server has responsibility but is still initializing.
READY means that a server is serving a set of tokens.

Changing the state of a server from DOWN to BOOTING to READY is
done by that server as part of its initialization.

Changing the state from READY to DOWN is initiated by the currently
elected leader if a server's heartbeat messages cease.


Server tasks

After initialization, a server is event-driven.  Events include:
    messages received from the leader
    periodic interrupts to check that a leader exists
    	(to send it a heartbeat, or to start a new leader)
    messages received from clients
    timeouts awaiting client responses


Server Initialization

First, the server starts a background task that runs periodically.
Each time, this task checks for a leader and sends it a heartbeat.
If no leader is running, it initiates a leader election.

The server then waits for a message from the leader.  (A leader will
soon be elected by the background process if one is not running.)
The server responds to the leader's message with its own heartbeat,
declaring itself to be BOOTING.  This status is reflected in the
leader's next broadcast message to all servers.  Other servers see
the BOOTING status and transfer the appropriate portions of their
held tokens to the new server.

The new server waits for messages from all the READY servers.  These
messages transfer the state of one portion of the token space, and
also confirm the transfer of responsibility: the senders will honor
no further requests for these tokens.

The new server also sends the updated server list to every client and
awaits responses listing tokens held by the clients.  This is necessary:
any client might hold a token obtained from another server that has
since crashed.  Queries are reissued periodically to each client until
the client responds or is declared DOWN.

Once the new server has received responses from all clients and all
other servers, it is ready to go.  It declares itself READY and
begins processing requests.


Receiving global state

Every time a server receives a new global state message from the leader,
it must check for state changes.

If the present server is marked DOWN, presumably due to a communication
failure, its authority for token serving has been redistributed among
the surviving servers.  It can reinitialize from the beginning with a
clean slate, but must forget all current data.  It must NOT declare
itself BOOTING or READY without reinitializing.

If another server is marked BOOTING, the present server stops serving
any tokens that will become the other server's responsibility, and
sends it a message transferring those tokens.  This may repeat for
multiple rounds in the case of message loss.

If another server is newly marked DOWN, the present server must reclaim
a portion of the dead server's tokens and begin serving them.  The
survivor queries all clients (similar to the initialization step) before
serving any tokens previously belonging to the crashed server.  The
server's query messages also inform the clients to stop listening to
the crashed server, should it later recover and send leftover messages.

If a client has disappeared from the client list, and that client holds
tokens maintained by this server, then those tokens should be marked as
free.


The Client List

A list of current clients is maintained as part of the global state.
Before a client can request tokens, it must "register" by contacting
the leader.  (Actually, it contacts *any* server, which tells it in
response which server is the leader.)  The leader generates a unique
ID and adds the client to the client list.

If a client tells the leader that it is closing down, the leader removes
the client from the list.

A client is also removed from the list if the leader fails to receive
any heartbeat messages for some specified interval.  Once a client is
removed from the client list, its ID is no longer valid and it can
perform no more token actions.


When a Server Crashes

A server crash is detected by the leader when it notices that the server
has stopped sending heartbeat messages.  It then declares the server
DOWN, initiating the recovery process by the surviving servers.
Each server queries each client regarding any tokens held by the
client that transfer to the particular server.  After this process
is complete, the surviving servers have assumed the responsibility
for all the tokens formerly served by the one that crashed.

If the leader crashes, the other servers notice this and elect a new
leader.  This new leader then notes the lack of a heartbeat from the
old leader and the server crash is handled as described above.

A client that attempts to communicate with a crashed server receives no
response and just keeps resending its request to no avail.  Eventually,
the client receives a query from another server that informs it of the
new server configuration.  It recalculates the destination for all its
outstanding requests and redirects them to their new addresses.  After
the servers complete the recovery process, they begin responding again
to client requests.

If a server is falsely declared DOWN, perhaps due to a communications
failure, it may still send some client messages before noticing this fact.
Any late messages that are received by the client before it is informed
of the server "crash" will be honored; but their effects will be reflected
in the client response to the subsequent recovery query and so will be
accounted for as part of the recovery process.  Late messages that come
after the query are discarded by the client and so pose no danger.


When a Client Crashes

A client crash is detected by the leader when an interval passes without
receiving any heartbeat messages.  The leader removes the client from
the list and distributes this information as part of the global state.
Each server receiving the revised list notes that the client is gone
and reclaims any tokens that were assigned to the client.

If a client declared down has not truly crashed, it may continue to
take actions outside of the token realm based on its misapprehension
that it is still a participant and that it still holds some tokens.
This is unavoidable, but it highlights the importance of conservative
criteria for declaring a client crash.  Once the client makes any
token calls, it receives an error reply, because its ID is no longer
valid.