Introduction

1 Introduction

Introduction

Communication latency is often just as important as throughput in distributed systems, and for this reason, researchers have analyzed the latency characteristics of common networking protocols, such as TCP/IP [15, 6, 14] and RPC [32]. This paper revisits the issue of protocol latency. Our goal is not to optimize a particular protocol stack, but rather, to understand the fundamental limitations on processing overhead. In doing so, this paper goes beyond the earlier work in three important ways:

Updated Study: It studies protocol latency on a modern RISC architecture---the 64-bit DEC Alpha---and in doing so, updates earlier studies that were performed on the x86 architecture. This is important because the different design tradeoffs applied to CISC and RISC designs typically lead to qualitative differences in processing behavior.
New Techniques: It describes and evaluates a new set of techniques that are designed to improve protocol latency. These techniques are targeted not so much at reducing the number of instructions executed to process each packet, but more at the number of cycles that each instruction takes.
Detailed Analysis: It contains a level of detail not found in other studies. In particular, it reports on instruction-cache (i-cache) effectiveness, as well as on processor stall rates. The bottom-line is that we evaluate protocol latency in terms of memory cycles per instruction (mCPI), a metric that will become increasingly important as improvements in memory speed lag farther behind improvements in processor speed [5, 29].

It should be clear from these three points that memory bandwidth---and in particular, the memory cycles required by each instruction---is a central focus of this paper. In fact, the experimental results presented in this paper show that the difference between the worst- and best-case mCPI that we were able to measure is a factor of 3.9 for the TCP/IP stack, and a factor of 5.8 for an RPC stack. The techniques we propose are primarily targeted at improving the mCPI, although some also have a positive effect on the instruction count.

Because these techniques are aimed at improving the mCPI of networking software, they are necessarily fine-grain. To be more precise, they can all be characterized as compiler-based techniques. As such, one might ask if they are specific to networking code, or if they are applicable to general applications (e.g., SPECmark code). The answer is that while it is likely that these techniques are of some benefit to application programs, they are motivated by the unique characteristics of networking software (specifically) and low-level systems code (more generally). For example, exception handling and other infrequently executed code often makes up a large portion of the critical execution paths in networking software. One of our techniques (outlining) exploits this fact. Also, execution in layered networking software often results in deep call chains and since each function call is typically an optimization barrier, in limited context available to the compiler's optimizer. A technique called ( path-inlining) attacks these two problems. As a final example, networking software is designed to handle a wide range of situations, but once a connection is established, it is often possible to specialize the code for that particular connection. A technique called (cloning) addresses this issue.

Note that this works focuses on networking code as currently deployed, that is, for code written in C. We do not propose a new programming language or paradigm for protocol implementation, although we observe that some of the proposed techniques have also proven useful in alternative protocol implementation languages [4].

The paper is organized as follows. Section 2 sets the context in which this research was performed. In doing so, it expands earlier studies on TCP/IP latency with results for a modern RISC machine. Section 3 describes and discusses the latency improvement techniques which are then evaluated in Section 4. Section 5 offers some concluding remarks.