CSc 352, Fall 2002: Programming Assignment 10: Pointers, Performance Tuning

Out: Thu Nov 14

Due: 12:00 Noon Thu Nov 21


You are to create a number of files, as directed below, and then submit them electronically on host lectura.cs.arizona.edu using the command

turnin cs352_hw10 files

Please follow the directions below carefully: submissions that don't follow directions will be penalized heavily.

Note: As before, your C code should compile with no errors or warnings when compiled with gcc -Wall. Moreover, each program should exit with a status of 0 if execution proceeds normally, and a status of -1 if any sort of error occurs.

In addition, your programs will be expected to follow the coding guidelines for this class. However, for performance tuning problems, I will be willing to suspend this requirement, provided that the documentation supporting your performance-enhancing transformations justifies the breaking of any such guidelines.

Because this assignment involves a lot of different files that have to be turned in, I have provided a script that will check to see whether all of your files have been turned in as expected.

After you have checked in all of your files, please execute the script

/home/cs352/FALL02/hw10_stuff/chk_turnin

This will tell you whether you have turned in all of the files that you're supposed to turn in, in a form expected by our grading scripts.


The Problems

  1. [ 50 points ] This program is conceptually related to the graph problem from Homework 6, though the input file format and the computational issues involved are somewhat different. For this assignment, your program is to behave as follows:

    Invocation:

    The program is called mymake, and takes an optional argument. If an argument is specified, it must be the name of a readable file. More than one argument is not permitted.

    Building the Executable:

    You should turn in, along with your source code files, a make file that supports the following commands:

    • make mymake : causes the C files to be compiled to produce an executable file mymake.
    • make clean : causes the deletion of all object files *.o, as well as any file named mymake, from the current directory.

    This makefile should be careful to avoid unnecessary dependencies between files: a file should be recompiled only when necessary. In other words, a trivial makefile, where every file is always recompiled whenever any file changes, is not acceptable.

    Program Structure:

    You should structure this program into files as follows:

    • mymake.h
      Contains typedefs, macro definitions, etc., for the data structure representing the dependence graph, as well as any other definitions (but not code!) you may want.

    • read_input.c
      Contains code to read in the input.

    • dep_graph.c
      Contains code to construct and traverse the dependence graph.

    • mymake.c
      Contains the main() function.
    Turn in the following files for this problem: mymake.h, read_input.c, dep_graph.c, mymake.c, and Makefile.

    Input:

    If a file is specified, the program its input from that file. Otherwise, if no command line argument is given, it reads from stdin.

    The input consists of three parts, in this order:

    1. A specification of a directed graph, consisting of a series of vertices and edges. The fact that the graph is directed means that the edges have a specific direction, and one can follow an edge only in that direction (think of it as a one-way street; by contrast, edges in an undirected graph, which we considered previously in the reachability and shortest-path problems, correspond to two-way streets).

      Each vertex or edge in the specification of the directed graph occurs on a line by itself, as follows:

      • A vertex now has two things associated with it: a name, which is a string of printable characters containing no whitespace characters (e.g., James.R.Bond_007.c); and an action, which is an arbitrary string, upto the end of the line, possibly containing whitespace (e.g., gcc -O2 foo.c).

        A vertex is specified as

        v vertex_name string

        This specifies a vertex whose label is the string vertex_name, and whose associated action is string.

      • An edge is specified as follows:

        e src_vertex dst_vertex

        This specifies a directed edge from src_vertex to dst_vertex. If there is an edge from a vertex X to a vertex Y, we say that X depends on Y.

      You may assume that each vertex name and action is at most 1024 characters in length.

      Note : The graph may contain cycles.

    2. A delimiter, consisting of the two characters "%%" on a line by themselves.

    3. A single query, consisting of a vertex name.

    You may assume that each non-EOF line ends with a newline, and that there is no whitespace before the newline character at the end of each line. You may also assume that the input has the format described above.

    Behavior:
    Your program should read in the specification of the graph in the first part of the input, construct a representation of this graph, and then process the query in the third part of the input in order as follows (an algorithm for processing these dependence graphs is given here).

    • Starting at the vertex specified in the query, your program will traverse the graph in a depth-first manner, making sure at each stage that the file named by each vertex exists and is up to date with respect to the vertices it depends on (a vertex p is up to date with respect to the vertices it depends on if, for every vertex q such that there is an edge from p to q, the time that file q was last modified is earlier than the time when file p was last modified).
    • If the file corresponding to a vertex does not exist, or is not up to date, the program will execute the action associated with that vertex. Before an action is executed, it is printed on stdout.
    • If all actions are executed successfully, execution stops with a return value of 0. Execution stops if any action signals an error by returning a non-zero error code, or if an infinite loop is detected (a vertex is visited more than once and its associated action executed); in this case the return value is -1.

    To simplify grading, we will adopt the following convention: when processing a vertex X, the vertices that X depends on will be visited and processed in sorted order w.r.t. the ordering determined by strcmp() on the vertex names.

    Restrictions:
    There should not be any fixed hard limits on the input, e.g., the number of vertices or edges allowed. Moreover, the amount of memory used by your program should be proportional to the size of the input. Please do not use realloc for dynamic memory allocation: I would like you to use malloc.

    Example input:
    v readgraph.o gcc -c -Wall readgraph.c
    e readgraph.o readgraph.c
    e readgraph.o graph.h
    v main.o gcc -c -Wall main.c
    v graph.o gcc -c -Wall graph.c
    e main.o main.c
    e main.o graph.h
    e graph.o graph.c
    e graph.o graph.h
    v mygraphgcc readgraph.o main.o graph.o -o mygraph
    %%
    mygraph

    Miscellaneous:
    The following system calls may be useful: see the respective man pages for details.

    • stat: information about when a file was last modified.

    • access: information about whether a file exists (use the mode F_OK).

    • system: to execute a string.

  2. [50 points ] The file /home/cs352/FALL02/hw10_stuff/revpairs.c is the C source code for a program that reads in a list of words, in a file specified as a command line argument, and prints out all the pairs of words
    x y
    such that the word x is the reverse of y (and vice versa), ignoring case distinctions. Unfortunately, this code is not very efficient.

    You are to improve the performance of this program using profiling tools (gprof) to identify hot spots. The rules for this performance tuning process are given here. Some general hints and techniques for performance tuning are given here.

    The file /home/cs352/FALL02/hw10_stuff/fast_revpairs is (the executable for) a faster version of your program. I will use this executable as the baseline for evaluating your program.

    Turn in your modified file, together with a README file that provides the following information for each separate code transformation you carry out:

    1. profile data (only the flat profiles portion from gprof output);
    2. the hot spot(s) identified for optimization based on this;
    3. the performance overheads identified from examining this profile, as well as the changes you made to your code carried out in order to eliminate such overheads; and
    4. the performance improvement obtained.

    The optimized program must still be correct, i.e., for the same input, it should compute the same output as the original program. Moreover, it must do this without in any way "special casing" the particular input being considered (e.g., by skipping all the computation and simply printing out a particular value if the input happens to have some special value).

    Given these constraints, your score on this problem will be based on the amount of speedup you are able to achieve (relative to the execution time of the the executable /home/cs352/FALL02/hw10_stuff/fast_revpairs, on lectura), for the input file /home/cs352/FALL02/hw10_stuff/WORDS:

    Your execution time
    (relative to fast_revpairs)
    Your score
    > 1000 0%
    500+ to 1000 10%
    100+ to 500 30%
    50+ to 100 50%
    20+ to 50 60%
    10+ to 20 70%
    5+ to 10 80%
    2+ to 5 90%
    1.0 to 2 100%
    < 1.0 125%
    Important Note on Timing Fast Programs:
    Because of the granularity of the system clock, execution times for a programs that run for less than a second are difficult to measure with a reasonable amount of accuracy. One way to get around this problem is to run the program repeatedly, and then divide the total runtime by the number of times the program was run. For example, we may run a program 100 times, and if the resulting total time is 3.2 secs, compute the average runtime for one run of the program as 3.2/100 = 0.032 secs.

    The shell script /home/cs352/FALL02/hw10_stuff/iter.csh provides some support in this regard.