CSc 453 : Programming Assignment 1 (html2txt)
Start Date: Fri Jan 15, 2016
Due Date: 11:59 PM, Sun Jan 24, 2016
1. General
This assignment involves writing a simple HTML-to-TXT translator.
The primary goal of this assignment is to get you acquainted with the compiler
front-end tools lex and yacc, which we'll
be using for the rest of the project. A secondary goal is to point out that
compiler ideas and tools are applicable for non-compiler problems as well.
Since the main focus of
this assignment is to get you started with these compiler tools, we'll keep things
simple and not try to handle all of the subtleties of HTML the way they
should really be handled (this means that if you compare your output with
the results of a commercial HTML-to-text translator, there may very well be
some differences).
Documentation on flex/lex and yacc/bison is available
here.
2. Functionality
Use flex (or lex) and yacc (or bison) to
write a program that translates HTML to text.
Your program should have the following functionality.
-
It should read its input from stdin, discard all HTML tags
(including comments: see below), recognize and handle a small set of
"special entities" appropriately, and write the remaining
text to stdout.
-
It should enforce some simple grammar rules (e.g., that the <li>
tag for list items can occur only within lists, or that tags specifying
boldface <b>...</b> and italics
<i>...</i>should be properly nested), giving an
appropriate error message if any grammar rule is violated.
-
The exit status of your program should be 0 if no errors
are encountered during processing, and 1 if any errors are
encountered at any point.
3. HTML Specification
The lexical and syntactic structure of our subset of HTML is given
here.
4. Invoking Your Program
Your executable program will be called myhtml2txt. It will read
input from stdin and write its output to stdout. Thus, to
translate an HTML file foo.html to a text file bar.txt,
invoke your program as
myhtml2txt < foo.html > bar.txt
5. Getting Started
To help you get started, I have placed the following files in the directory
/home/cs453/spring16/assignments/html2txt on lectura:
-
Makefile: a sample makefile that shows how flex might be
invoked.
-
myhtml2txt: an executable that (supposedly) implements the behavior
expected of your program. This is a x86-64/Linux executable, and
will run on lectura but may not run on other machines (e.g.,
Macs or Windows machines).
-
test.html: a test input.
Please note that this is one of possibly many inputs your
program may be tested with; it is made available to be helpful, but does not
make any pretense of being exhaustive.
It is your responsibility to understand
the assignment spec, implement your program accordingly, and test it
thoroughly.
6. Turnin
Turn in your files on host lectura.cs.arizona.edu. You should turn
in all of your source files, as well as a Makefile that supports the
following targets:
- clean
-
Executing the command make clean should delete the *.o files,
as well as the executable myhtml2txt, from the current directory.
- myhtml2txt
-
Executing the command make myhtml2txt should create, in the current
directory, an executable file myhtml2txt that implements your
HTML-to-text translator from scratch, by invoking the appropriate tools
(lex/flex) on the input specifications.
To turn in your files, use the command
turnin cs453s16-html2txt
file1
file2
...
filen
Please submit your files just as they are: do not submit a directory containing your files,
or zip them up into a single file, or do anything else that requires additional
manual intervention.
For more information on the turnin command, try man turnin.
Note: The turnin command copies the files submitted
into another directory. Because of this, programs that compile and execute
without problems in your directory may not work correctly once they are
turned in, because of problems with relative path names in include files
and make files. Such problems are considered to be sloppiness inappropriate
in an upper division course, and are liable to be penalized heavily.
The output of your program will be compared with our output using
diff utility (see diff(1)), so it is recommended that you follow
the specification, and instructions for turnin, closely.