Introduction

1 Introduction

Introduction

Layering is a fundamental structuring technique with a long history in system design. From early work on layered operating systems and network architectures [12, 32], to more recent advances in stackable systems [27, 15, 14, 26], layering has played a central role in managing complexity, isolating failure, and enhancing configurability. This paper describes a complementary, but equally fundamental structuring technique, which we call paths. Whereas layering is typically used to manage complexity, paths are applied to layered systems to improve their performance and to solve problems that require global context.

We begin by developing some intuition about paths. A path can be viewed as a logical channel through a multi-layered system over which I/O data flows, as illustrated in Figure 1. In this way, a path is analogous to a virtual circuit that cuts through the nodes of a packet-switched network. The only difference is that paths are within a single host, while virtual circuits run between hosts.¹

Figure 1:Two Paths Through a Layered System

Also, the term ``path'' is well entrenched in our vocabulary. For example, we often refer to the ``fast path'' through a system, implying that the most commonly executed sequence of instructions have been optimized. As another example, we sometimes talk about optimizing the ``end-to-end path,'' meaning we are focused on the global performance of the system (e.g., from I/O source to sink), rather than on the local performance of a single component. As a final example, we sometimes distinguish between a system's ``control path'' and its ``data path,'' with the former being more relevant to latency and the latter more concerned with throughput.

Finally, paths can be loosely understood by considering specific OS mechanisms that have been proposed over the last few years. Consider the following examples.

Fbufs [6] are a path-oriented buffer management mechanism designed to efficiently move data across a sequence of protection domains.² Fbufs depend on being able to identify the path through the system over which the data will flow.
Integrated layer processing (ILP) [4, 1] is a technique for fusing the data manipulation loops of multiple protocol layers. It depends on knowing exactly what sequence of protocol modules a network packet will traverse.
Packet classifiers [31, 20, 2, 8] distinguish among incoming network packets based on certain fields found in their headers. In a sense, a packet classifier pre-computes the path that a given message will follow.
Specialization is sometimes used to optimize common path code sequences [24, 23]. Specialization, in turn, depends on the existence of invariants that constrain the path through the code that is likely to be executed.
The Alpha OS allows threads to migrate across a sequence of protection domains [5]; others have defined similar mechanisms [13, 9]. Such mechanisms recognize that tasks often span multiple domains, and so account for resource usage on a path basis rather than a domain basis.

The thesis of this paper is that these mechanisms are not isolated optimizations, but rather, that they can be unified and explained by the path abstraction. In a nutshell, these mechanisms all share the following fundamental idea: they expose and exploit non-local context.

Consider a layered system like the one illustrated in Figure 1. While the advantage of layering and modularity is to hide information, there are many situations when it would be beneficial for a given layer to have access to non-local context. For example, suppose one of the modules is processing an Ethernet packet. With only local context, the module knows nothing about the packet's relative importance compared to other packets. However, if it is known that the packet is part of a particular video stream, then it is easy to determine its processing deadline, what modules need to be executed to process it, how many CPU cycles this processing will require, where its data should be placed in memory, and so on. In other words, by knowing a certain set of invariants (e.g., that the packet is part of some video stream), the module is able to access and exploit global context that is available outside any one module or layer. Abstractly then, a path is defined by these invariants and provides access to the corresponding context.

Having access to non-local context leads to two kinds of advantages: (1) improved resource allocation and scheduling decisions, and (2) improved code quality. In the former case, work is segregated early, facilitating the following benefits:

The system can place data in a memory buffer that is already accessible to all the modules along the path. This is essentially what fbufs do. In contrast, data often has to be copied (either logically or physically) from one buffer to another at each module or layer boundary.
The system can know that a particular path needs to be scheduled for execution in order to meet a deadline; e.g., display a video frame. This is critical to being able to offer different Qualities of Service (QoS). In contrast, not segregating work into paths means that low-priority work may need to be done to discover high-priority work that needs attention.
If scheduling deadlines for a particular path are such that it is impossible to make use of a particular piece of work (e.g., network packet or video frame), then the system can discard unnecessary work early, that is, before executing the path. A conventional system often has to process several layers before knowing that continuing is of no value.

In the latter case---improved code quality---the system has more information available to it, making more aggressive code optimizations possible. Examples of such optimizations include the following:

The more invariants the system knows about code to be executed, the more opportunities the system has to specialize the code path. For example, the system can do constant folding and propagation, dead-code elimination, and interprocedural register allocation.
The more layers across which the system is able to optimize, the more opportunities there are to eliminate redundant work. For example, the more protocol layers available, the more loads and stores integrated layer processing can remove. Similarly, it is sometimes possible to merge per-layer operations. For example, instead of having each layer check for the appropriate header length, it is possible to check for the sum of all header lengths at the beginning of packet processing.

This paper makes two contributions. First, it develops an explicit path abstraction; Section 2 explores the design space for paths, and Section 3 describes an implementation of paths in the Scout operating system. Second, the paper demonstrates how having a path abstraction leads to the first set of advantages outlined above, i.e., those that have to do with improvements in resource allocation and scheduling. In particular, Section 4 describes an application that receives MPEG-compressed video over a network and then decodes and displays it. A companion paper demonstrates some of the code-related improvements attributable to paths [23].