Concluding Remarks

5 Concluding Remarks

Concluding Remarks

Networking system designers have known for some time that memory bandwidth plays a critical role in end-to-end throughput. This paper argues that memory bandwidth also is a major player in protocol processing latency. It demonstrates this with measurements performed on a modern 64-bit workstation. While the quantitative results certainly are machine-specific, it is reasonable to expect that the qualitative conclusions will generalize to most other high-performance RISC-based systems.

Beyond this basic result, the paper describes three techniques that can be applied to networking code to improve the latency situation. These techniques have two benefits. First, they significantly improve execution speed by reducing the mCPI. Fundamentally, this reduction is achieved by (a) increasing the dynamic instruction stream density, (b) reducing the number of cache conflicts, and (c) reducing the critical-path code size. The impact of these techniques will grow rapidly as the gap between processor and memory speeds widens. For example, this research was conducted on a machine with a 175MHz Alpha processor, a 100MB/s memory system, and a 10Mbps Ethernet. We now also have in our lab low-cost machines with a 300MHz processor, an 80MB/s memory system, and 100Mbps Ethernet. Second, even though case BAD reported in Section 4 was constructed artificially, sub-optimal configurations are possible and not uncommon in practice. For example, the measured mCPI for the DEC Unix v3.2c TCP/IP stack is 2.3, which is significantly worse than the 1.58 mCPI measured for the standard x-kernel. The proposed techniques make it relatively easy to avoid such bad cache behavior. That is, they help improve the predictability of a system.