Locality and phases have come to play a central role in understanding and improving the dynamic behavior of large computer programs. While intuitively appealing, these concepts have proven difficult to apply in practice: they suggest the need to define, measure, and verify patterns across billions of operations, and to cope with behavior that changes with program input.
In this talk I will present behavior-based program analysis. By profiling a few runs of an application, the analysis finds common patterns in program behavior and predicts how those patterns will change in other runs, in a manner very much analogous to prediction in the physical and biological sciences. I will describe three models of whole-program behavior. The first redefines temporal locality quantitatively as the miss rate of a program across inputs and cache sizes. The second is a new notion called reference affinity, which reveals the hierarchical locality structure in program data. The third is program phases, which are recurring execution patterns involving large sections of program code. These temporal and spatial models bridge between traditional program analysis and program profiling, and between popular heuristics (frequency, distance, and topology) and the seemingly insurmountable theoretical complexity. In practice, they enable new techniques in static and dynamic data, cache, and memory management. These techniques uncover emergent behaviors that are not obvious from analyzing individual program components, program inputs, or machine environments. I will discuss the implication of these findings for compilers, programming languages, and systems design.