Implementing Paths in Scout

3 Implementing Paths in Scout

Scout is an experimental operating system designed for network appliances---e.g., set-top boxes, file- and web-servers, and cluster computers. Scout is designed around the path abstraction, supports both non-realtime and soft-realtime applications, and runs in a single address space. This section describes how the path abstraction is implemented in Scout.

Note that compatibility with standard application interfaces (e.g., POSIX) is not a major goal of Scout, except to understand how such interfaces either exploit or interfere with paths. On the other hand, interoperability with existing protocol specifications is an important requirement of Scout.

3.1 Routers and Services

Routers and Services

Just as in the architecture, routers are the unit of program development in Scout. A router implements some functionality such as the IP protocol, the MPEG decompression algorithm, or a driver for a particular SCSI adapter. A router implements one or more services that can be used by other higher-level routers. As is typical in a layered system, most routers themselves use other lower-level routers to implement their services. Scout does not, however, enforce strict layering. Cyclic dependencies are admissible as long as there is a partial (non-cyclic) order in which the routers can be initialized.

Each service in a router has a name and a type. The names are unique, but otherwise arbitrary and chosen by the programmer. The relevance of service types is explained in more detail below. For the purpose of configuring a router graph, it is sufficient to know that two services can be connected by an edge only if they are mutually compatible. Figure 6 illustrates routers, services, and how they interact in a router graph. In this partial router graph, IP has three services: up, down, and res. The first two are of type net and the latter is of type nsClient (for ``name-service client''). The down service is connected to ETH's up service. This connection is used by IP to send and receive IP datagrams. The res service is connected to ARP's resolver service. IP uses this to translate IP host numbers into Ethernet addresses. ARP itself is connected to ETH as well so that it can broadcast and listen to relevant ARP packets.

Figure 6:Routers and Services.

A router is implemented simply as a collection of C source files. These files, along with the external interface, are described in a spec file. The syntax for spec files is shown below:

router name {
    files = {filename, ...};
    service = {name:type, ...};
}

A service name may be preceded by a less-than marker (<) to indicate that the routers connected to that service must be initialized before this router can be initialized. The Scout infrastructure ensures that router initialization occurs in an order that is consistent with the partial order defined by these markers. For the purpose of router initialization, cyclic dependencies are forbidden. The Scout development environment includes a configuration tool that translates a router graph into C source code that creates and initializes the runtime view of a router graph when the system boots. This configuration tool checks for and rejects any router graph with cyclic dependencies.

At runtime, a Scout router is represented by a variable of the following structure:

struct Router {
    String          name;
    long            (*init)(Router r);
    CreateStageFunc createStage;
    DemuxFunc       demux;
    RouterLinkList  links[NSERVICES];
};

That is, a router consists of its name (member name), three function pointers (members init, createStage, and demux) and a list of router graph edges that connect to this router to other routers (member links). Each router r provides just one globally visible operation:

Router rCreate (String n, int c[]);

This operation is used to create a specific router with name n. The integer array c specifies how many times each router service has been connected to other routers. Once all routers are created, Scout initializes them in the partial order described above. A router is initialized by a call to its init function.

3.2 Path Object

Path Object

Section 2 argued that it is preferable to create paths incrementally, with the resulting paths initially consisting of a sequence of sub-functions g_i. Likewise, Scout paths consist of a sequence of stages. Each router that is crossed by a path creates one such stage. Since a path enters a router at one service and leaves it through another, a stage effectively connects a pair of services. That is, it represents a fixed routing decision.

A stage is a rich object that contains at least the following members:

struct Stage {
    Iface  end[2];
    Path   path;
    Router router;
    long   (*establish)(Stage s, Attrs a);
    void   (*destroy)(Stage s);
};

Member end is an array containing pointers to the interfaces of the stage. These interfaces are derived from the services that a stage connects in a manner that will be explained below. The path and router members point to the path that the stage is part of and the router that created the stage, respectively. The establish and destroy function pointers are used during path creation and destruction and are explained in more detail in Section 3.3.

The relationship between interfaces and router services is as follows. Each router service type consists of a pair of interface types: the first element in this pair specifies what interface the service provides whereas the second element specifies the interface that the service requires to function properly. For example, the net service type is symmetric in the sense that it both provides and requires a net interface. This can be expressed as the pair:

  servicetype net = <NetIface, NetIface>;

Scout supports simple single inheritance for interface types. This means that instead of the exact interface type required by a service it is possible to provide a more specific interface. Hence, the precise rule used to decide whether a pair of services can be connected in a router graph is that the interfaces provided must be identical to or more specific than the interfaces required.

All interfaces encountered when traversing a path in a particular direction are chained together. Since it is sometimes necessary to ``turn around'' the data flow inside a path, each interface also contains a back pointer to the next interface in the opposite direction. A third pointer provides access to the stage to which the interface belongs. Therefore, the most primitive interface is given by:

struct Iface {
    Iface next;
    Iface back;
    Stage stage;
};

This obviously is not a very interesting interface since it provides no way to deliver data. All real interfaces declare additional members that hold function pointers or other data. For example, the net interface is declared as follows:

struct NetIface {
    struct Iface i;
    long (*deliver)(Iface i, Msg m);
};

That is, the net interface provides a single function to deliver a message m to interface i. While Scout can technically support an arbitrary number of interface types, the intent is to keep this number as small as possible. For example, at present there is an interface type to asynchronously exchange messages (this is used both by filters and networking protocols), a window manager interface, a file system interface, and a few other, lesser interface types.

Given the definition of stages and interfaces, it is now easy to describe the actual path object:

struct Path {
    Stage     end[2];
    long      pid;
    void      (*wakeup)(Path p, Thread t);
    PathQueue q[4];
    struct Attrs attrs;
};

The array end contains two pointers to the stages at the extreme ends of the path. A path can set the wakeup function pointer to request that a specific function gets executed when a thread t is awakened to execute in a path p. This is discussed more in Section 3.4. The four path queues are stored in q. These queues are generic in the sense that the queuing discipline is unspecified. The two properties that are defined for any such queue is the current length and the maximum length. Finally, attrs is a set of name/value pairs (attributes). Attributes allow to attach arbitrary state to a particular path. For example, this enables stages to exchange and share information anonymously (without knowing exactly what stage is the source of the information and what stages are the consumers).

A path can therefore be visualized as shown in Figure 7. The path shown there consists of four stages. The stages were created by the TEST, UDP, IP, and ETH routers. Each interior stage contains two interfaces (semi-circles), whereas the stages at the extreme ends of the path contain only one interface each. These extreme stages are, strictly speaking, not part of the path but they are used to connect to the routers that manage the path queues.

Figure 7:Path structure.

3.3 Path Creation

Path Creation

Paths are created and destroyed using the following functions:

Path pathCreate(Router r, Attrs a);
void pathDelete(Path p);

A path is created by invoking pathCreate on a router r. The kind of path to be created is described by the set of attributes a. These attributes are arbitrary name/value pairs that specify the invariants that hold true for the path being created. The pathCreate results in an invocation of the createStage function in router r (see Section 3.1). The createStage function has the following type:

Stage (*CreateStageFunc)(Router r, int s,
                         Attrs a,
                         RouterLink* n);

Here, r is the router on which pathCreate was invoked and s is the number of the service through which the path being created enters the router. Since r is the first router in the path, there is no such service, so the value is set to -1 (not a valid service number). Argument a is the set of attributes passed to the pathCreate operation. Once r creates a new stage and makes a routing decision, it sets n to the router/service pair that the path must traverse next, if there is such a pair, otherwise it sets it to NULL.

Given the next router/service pair, the createStage operation is invoked on that next router. Now, argument s is set to the index of service through which the path enters the router and a is the (possibly modified) set of attributes. This process continues until a path reaches its full length, which happens either when it reaches a leaf router or when the attributes are so weak that no unique routing decision is possible. When either event occurs, a sequence of stages has been created. The stages and the interfaces contained therein are then linked together into a path structure. Once the path object is fully created, the establish functions in the stage objects are executed in the order in which the stages were created. This gives each stage a chance to perform initialization that depends on the existence of the entire path.

As described so far, path creation consists of three phases: (1) create sequence of stages, (2) combine stages into path object, and (3) establish (initialize) stages. During a fourth and final phase, path transformation rules are applied to the path. This provides the means through which Scout uses global knowledge to transform and optimize paths. Semantically, transformation rules have no effect, but they typically result in better performance and better resource allocation or usage. For example, if a path contains a sequence of interfaces for which there is optimized code is available, then the function pointers in the interfaces can be updated to point to this optimized code. More details on such code-related path-transformations can be found in a companion paper [23]. Section 4 discusses some transformations that improve resource management.

When a Scout system boots, there are typically a few routers that create a handful of paths, e.g., to receive key strokes or network packets. All other paths are either directly or indirectly created by these initial paths. In other words, path creation and destruction is under control of the routers that are present in a given system. The Scout infrastructure never creates or destroys paths implicitly.

3.4 Path Execution

Path Execution

Paths are executed by threads---the active entities in Scout. A router starts execution of a path by dequeuing data from the input queue and invoking an interface-type dependent data-delivery function.

Since threads are independent objects and since path queues can often be optimized away, it is possible for a thread to execute a path, enter a router, and then continue execution in another path without any context switches. This is important because degenerate paths can be short, so forcing context switches at every path/router crossing could result in an excessive number of context switches, and therefore, less than optimal performance.

In Scout, threads are scheduled non-preemptively according to some scheduling policy and priority. Scout supports an arbitrary number of scheduling policies, and allocates a percentage of CPU time to each. The minimum share that each policy gets is determined by a system-tunable parameter. Two scheduling policies have been implemented to date: (1) fixed-priority round-robin, and (2) earliest-deadline first (EDF) [18]. The reason for implementing the EDF policy is that for many soft-realtime applications, it is most natural to express a path's ``priority'' in terms of a deadline. We present an example of this in the next section.

Scout uses a non-preemptive scheduler because it meets our needs and is easy to use. In the future, Scout will allow for uncooperative ``threads,'' but since it is not a good idea to share any resource with uncooperative threads in an uncontrolled manner, those threads will not share memory either. That is, uncooperative threads will be isolated from each other in some manner (e.g., through separate address spaces, fault isolation, or a safe language). If uncooperative threads do not share memory, using a preemptive scheduler among them is trivial. Thus, scheduling is split into domains---within a domain, there is trust and hence a non-preemptive scheduler can be used. Across domains, there is no trust and a preemptive scheduler is necessary. This is not unlike what many traditional UNIX kernels do---the kernel ``threads'' are scheduled non-preemptively whereas the user-level processes are scheduled preemptively.

Once a thread executes on behalf of a path, it can trivially adjust its own priority as necessary. However, there also needs to be a mechanism that allows a newly awakened thread to inherit a path's scheduling requirements. For this purpose, a path can set the wakeup function pointer in its path object to a function that selects the appropriate scheduling policy and priority for a newly awakened thread.

3.5 Finding Paths

Finding Paths

In many cases, knowing the path that should be used for a given set of data is trivial. For example, an application might create a path to a graphics window and then use that path to draw lines and paint text. In some cases, however, the path to be used is determined implicitly by the data itself. For example, when a packet arrives at a network adapter, it is not immediately known which path that packet belongs to. For this reason, each Scout router provides a demux operation that maps the data into a path that can be used to process that data. This problem is identical to what is referred to a ``packet classification'' in the networking literature. Since Scout uses packet classification in a context that is somewhat unusual, it is worth enumerating the specific requirements that it places on this process:

Efficient enough to handle peak-loads. Classification must take a short amount of time relative to the typical path execution time. Otherwise, the advantage of improved resource management due to paths would be lost.
Provide relaxed (best-effort) classification accuracy. Unlike traditional classifiers, the Scout classifier just has to find a path that is ``good enough'' to process the given data. This is best illustrated with an example: suppose the data to be classified is an IP fragment. Traditional classifiers defer classification until the entire IP datagram has been reassembled. For the purposes of Scout, it is acceptable to hand off IP fragments to a path that knows how to reassemble the fragments. Once the full datagram is available, the IP protocol can rerun the classifier to find the next path.

Many packet classifiers have been proposed (e.g., [31, 20, 2, 8]), but none of them address all of the Scout requirements satisfactorily. For this reason, Scout adopted the simple solution of requiring each router to provide a function that performs a classification. Any given router typically implements only a small portion of the entire classification process. If a router cannot make a unique classification decision, it may ask the next router to refine that decision. This continues until either a unique path is found or until it is determined that no appropriate path exists. In the latter case the offending data is simply discarded.

3.6 Remarks

Remarks

Figure 8 summarizes the Scout timeline. At the earliest time (top), individual routers and path transformations are implemented. Later on, a system is configured by specifying a router graph and selecting appropriate transformation rules. The kernel is then built and booted. During runtime, paths are created, executed, and eventually destroyed when no longer needed.

Figure 8:Scout Development Timeline.

As implemented in Scout, paths are light-weight. For example, a path to transmit and receive UDP packets consists of six stages. Creating such a path on a 300MHz Alpha takes on the order of 200 micro-s. This time does not include the application of any transformations. The path object itself is about 300 bytes long and each stage is on the order of 150 bytes in size (including all the interfaces). Also, packet classification is reasonably efficient. The first (unoptimized) implementation of the Scout classification scheme is already able to demultiplex a UDP packet in less than 5 micro-s.

There are many other aspects of Scout that space does not permit us to describe; most of them are orthogonal to paths. For example, we believe software-based fault isolation (SFI) [30] could be imposed on top of paths by defining each router to be in a separate fault domain. Similarly, hardware-enforced protection could be imposed between paths. Note that the horizontal partitioning (SFI) is possible because Scout routers have well-defined interfaces, while the vertical partitioning (hardware protection) is enabled by explicit paths.

Also, the Scout router graph is configured at build time, and as currently defined, it is not possible to extend the graph at runtime. However, it is possible to configure an interpreter into the router graph, thereby supporting extensibility. For example, we are currently implementing the Java API (and interpreter) in Scout [10]. This will make it possible to download Java applications into Scout at runtime.