Token Server Protocol Requirements

10-jan-2000 / gmt


General Idea:
    a robust, distributed token service for use by Swarm filesystem clients
    clients communicate with servers to request and release tokens
    service survives multiple server crashes and arbitrary client crashes
    on a server crash, noncritical data (unassigned tokens) may be lost

Platform:
    Unix and Java, plus a C client interface
    UDP, IP multicast

Failure model:
    random messages may be dropped or reordered
    communication links may fail totally, and may later recover
    clients and servers crash "hard"
	they behave correctly, or else they crash  (no Byzantine failures)
	they do not send any more messages after a malfunction
	they do not crash and recover without *knowing* they're restarting
    clients and servers may hang
    	they may send otherwise valid messages after appearing to have crashed
    servers are basically reliable
    	crashes and hangs are infrequent
	it's okay if recovery is relatively expensive
    clients may be more flaky
    	crashes are more common than with servers, but still unusual

Clients do not intercommunicate:
    they can get all necessary state from servers

Only servers share state:
    list of servers, which implies delegation of tokens
    list of clients
    identity of leader, if one is used

Most actions do not involve shared state
    exchanges between clients and servers are much more frequent
    on server failure, token redistribution may outweigh global state changes

Why Paxos is insufficient
    too heavyweight for use with every single action
    	... so must add another protocol for common operations
    no provision for electing a single leader
    	... so must add more code to elect one and keep Paxos from thrashing

Why Paxos is overkill
    its basic premise is that participant list can be constantly changing
    it guards against Byzantine failures and malicious participants
    it is *very* robust and needs only a simple majority to do anything
    	need not even have the same majority for a single full round


What We Really Need Is A Good 5c Leader Election
    many things get simpler if we put a leader in charge of global state
    any server can act as leader, a relatively minor additional responsibility
    must be sure to allow for & recover from all possible failure scenarios