CSc 352: Profiling and Performance Tuning - The Rules

This document specifies the rules that must be followed for assignments involving profiling and performance tuning.

1. Compiler Flags

The particular compiler being used, and the level of optimization chosen, affect the quality and speed of the code generated by the compiler. To make meaningful comparisons in performance we have to keep these parameters fixed. We'll use the GNU C compiler gcc as our compiler, and optimize at optimization level -O2. Also, it turns out that specifying the flag -mcpu=ultrasparc allows gcc to generate code optimized for the processor used in Lectura.

In summary: for performance comparisons, compile your programs using

gcc -Wall -O2 -mcpu=ultrasparc ...

2. Transformations Allowed

Pretty much any transformation is permitted (subject to some constraints, see below), provided that the program has the same observable behavior (i.e., produces the same output as before for any given input).

The transformations applied to improve performance must satisfy the following criteria:

  1. Justification: The transformation must be justified with profile data (obtained either using gprof or gcc -a) that identifies specific performance problems at specific points in the program, and the transformation must address the particular problems raised. In other words, you can't just arbitrarily replace one program by a completely different one without any explanation.

    Typically, the justification for a series of performance-improving transformations will be turned in along with the improved programs, e.g., as a README file (or as required by the assignment spec). Note that a particular transformation may cause changes to more than one place in the program, e.g., replacing a linked list data structure by a hash table; the point is that for a particular transformation, all of the changes are related to a single conceptual change you are making to your code. For each program, the README file should document the entire sequence of changes you made, in each case giving profile data to identify the performance problem, describing the change you made to address the problem, and describing the extent of performance improvement achieved as a result. Thus, if the final version of your program required five different transformations to your code, your README file will contain five different sets of justification; each of them will contain profile data, a description of the transformation, and its effect on performance.

  2. Space-Time Tradeoffs: It is often the case that you can trade time for space, i.e., reduce the execution time of the program by increasing the amount of space it takes.

    Transformations that involve space-time tradeoffs are OK, provided that the additional space requirements are kept ``reasonable.'' For the sake of simplicity, I will consider any space increase of upto 4 KB for any particular program ``reasonable.'' It is possible that, depending on the program, larger space increases may be reasonable: please check with me ahead of time if you need more than 4 KB of extra space.

3. Timing Your Programs

We will use user time figure from the shell command time to time your programs. For the official timings, each program will be run 5 times, the highest and lowest run times will be discarded, and the average of the remaining three times will be used as the execution time of the program.

Since the actual runtime obtained can vary depending on the system load, we will try to ensure that your program is running under similar conditions to the ``baseline'' program it is being tested against as follows: the programs will be run alternately, i.e., your program, then the baseline, then your program, then the baseline, and so on. The idea is that if the system load happens to be high, then both your program and the baseline will be affected similarly by this.