University of Arizona, Department of Computer Science

CSc 453 : Programming Assignment 2 (Lexical and Syntax Analysis)

Start Date: Mon Jan 25, 2016

Due Date: 11:59 PM, Tue Feb 9, 2016


1. General

This assignment involves implementing a scanner and parser for μC, using a scanner generator such as lex or flex and a parser generator as such as yacc or bison for this purpose.

For this assignment, your code should deal with only the lexical and syntax rules of μC. In other words, anything that requires semantic information-i.e., information involving declarations-should be ignored.

At this point, your program will act simply as a syntax checker: syntactically correct input will be accepted silently, while syntax errors will give rise to error messages that will be reported to stderr.

2. The Scanner

2.1. General

The scanner should be implemented as a function that returns, each time it is called, either a positive integer indicating what kind of token was found on the input stream, or the value 0 ("end of file") indicating that no further input is available. Note that keywords cannot be used as identifiers.

The values of different kinds of tokens should be defined as macros to simplify the interface between the scanner and parser. For this purpose, it is simplest to define single-character tokens such as ( and ; to have the value of the corresponding character constant, e.g., the value of a "left-parenthesis" token will be that of the character constant '('. (The simplest way to do this is to use yacc -d to generate a file y.tab.h that contains the macro definitions, then #include this file into the scanner. Your make file will have to be set up carefully to make this work right.)

2.2. Comments and Whitespace

Comments and whitespace are to be skipped silently. It is an error to encounter an end-of-file inside a comment.

2.3. Errors

The simplest way to deal with lexical errors is to let the parser worry about them. This can be done by simply returning the value of any unrecognized character to the parser.

3. The Parser

3.1. General

You should transform your grammar, as necessary, to eliminate conflicts. The "dangling else" shift/reduce conflict will be tolerated, as will shift/reduce conflicts between error productions, but you will be penalized for any other conflicts. If you encounter conflicts, you may consult the file y.output generated by yacc (invoked with the -v option) for more information.

3.2. Errors

You are to implement error handling and recovery for syntax errors. This does not include errors involving semantic checking (i.e., anything that demands information from declarations), which will be dealt with in the next assignment.

Your program will be expected to deal with errors in a "reasonable" way. Error messages should be specific and should contain enough information (with at least a line number) to allow the user to locate syntax problems. Error recovery should allow your parser to recover gracefully and continue processing the input even after syntax errors are encountered.

3.3. Exit Status

The exit status of your program should be 0 if no errors are encountered during processing, and 1 if any errors (including syntax errors) are encountered at any point.

4. Invoking Your Program

Your program will be called compile. It will read from stdin and send all output to stdout. Error messages will be sent to stderr. E.g.:
cat foo.c | ./compile
or
./compile  <   foo.c

5. Turnin

You should turn in the sources to your code on lectura. These should include:
  1. The sources and headers for your scanner and parser, in particular the input specifications to the tools flex, bison>, etc.
  2. A main routine that calls your parser;
  3. A make file called Makefile that should support at least the following targets:

  4. Any additional material you wish to turn in. Any documentation or comments may be turned in in a file README.
To turn in your files, use the command
turnin   cs453s16-assg2   file1 file2 ... filen
Turn in the files you want to submit just as they are: don't zip them up or turn in a directory containing your files. For more information on the turnin command, see man turnin.

Note: The turnin command copies the files submitted into another directory. Because of this, programs that use relative path names in include files and make files (e.g., #include "../../foo/bar/baz.h") may not compile and execute correctly once they are turned in. Please avoid using relative pathnames.

The output of your program will be compared with the "expected" output using diff utility (see diff(1)). With the exception of error messages (where the requirements are given above), your output must follow the specification exactly. For this reason it is recommended that you follow the specification, and instructions for turnin, closely.