University of Arizona, Department of Computer Science

CSc 453 : Programming Assignment 2 (Lexical and Syntax Analysis)

Start Date: Mon Sept 17, 2012

Due Date: 11:59 PM, Sun Sept 30, 2012


1. General

This assignment involves implementing a scanner and parser for C--, using a scanner generator such as lex or flex and a parser generator as such as yacc or bison for this purpose.

For this assignment, your code should deal with only the lexical and syntax rules of C--. In other words, anything that requires semantic information-i.e., information involving declarations-should be ignored.

2. The Scanner

2.1. General

The scanner should be implemented as a function that returns, each time it is called, either a positive integer indicating what kind of token was found on the input stream, or the value 0 ("end of file") indicating that no further input is available. Note that keywords cannot be used as identifiers.

The values of different kinds of tokens should be defined as macros to simplify the interface between the scanner and parser. For this purpose, it is simplest to define single-character tokens such as ( and ; to have the value of the corresponding character constant, e.g., the value of a "left-parenthesis" token will be that of the character constant '('. (The simplest way to do this is to use yacc -d to generate a file y.tab.h that contains the macro definitions, then #include this file into the scanner. Your make file will have to be set up carefully to make this work right.)

2.2. Comments and Whitespace

Comments and whitespace are to be skipped silently. It is an error to encounter an end-of-file inside a comment.

2.3. Errors

The simplest way to deal with lexical errors is to let the parser worry about them. This can be done by simply returning the value of any unrecognized character to the parser.

3. The Parser

3.1. General

You should transform your grammar, as necessary, to eliminate conflicts. The "dangling else" shift/reduce conflict will be tolerated, as will shift/reduce conflicts between error productions, but you will be penalized for any other conflicts. If you encounter conflicts, you may consult the file y.output generated by yacc (invoked with the -v option) for more information.

3.2. Errors

You are to implement error handling and recovery for syntax errors. This does not include errors involving semantic checking (i.e., anything that demands information from declarations), which will be dealt with in the next assignment.

Your program will be expected to deal with errors in a "reasonable" way. Error messages should be specific and should contain enough information (with at least a line number) to allow the user to locate syntax problems. Error recovery should allow your parser to recover gracefully and continue processing the input even after syntax errors are encountered.

4. Invoking Your Program

Your program will be called compile. It will read from stdin and send all output to stdout. Error messages will be sent to stderr. E.g.:
cat foo.c | compile

At this point, your program will act simply as a syntax checker: syntactically correct input will be accepted silently, while syntax errors will give rise to error messages that will be reported to stderr.

Sometimes students find it difficult to resist the the urge to "improve" on this spec by having their compilers read from a file rather than from stdin. Please resist this urge (or, if you can't, be prepared to sacrifice 10 points off your score). I know you can write code to do file I/O; show me that you can read, understand, and follow a spec. :-)

5. Turnin

You should turn in the sources to your code on lectura. These should include:
  1. The sources and headers for your scanner and parser. If you use tools such as lex, yacc, etc., you should turn in the specifications given to these tools.
  2. A main routine that calls your parser;
  3. A make file called Makefile: this should support at least the command "make compile", whose effect is to create an executable file in the current directory called compile that implements your parser together with auxiliary routines such as the lexical analyser, error handling routines, etc. This command should result in the creation of your scanner and parser from scratch, by invoking the appropriate tools on the input specifications.
  4. Any additional material you wish to turn in. Any documentation or comments may be turned in in a file README.
To turn in your files, use the command
turnin   cs453f12-assg2   files
For more information on the turnin command, try man turnin.

Note: The turnin command copies the files submitted into another directory. Because of this, programs that compile and execute without problems in your directory may not work correctly once they are turned in, because of problems with relative path names in include files and make files. Such problems are considered to be sloppiness inappropriate in a graduate course, and are liable to be penalized heavily.

The output of your program will be compared with our output using diff utility (see diff(1)). With the exception of error messages (where the requirements are given above), your output must follow the specification exactly. For this reason it is recommended that you follow the specification, and instructions for turnin, closely.