ipp.icn: Program to preprocess Icon programs

March 26, 2002; Robert C. Wieland, revised by Frank J. Lhota
This file is in the public domain.
   Ipp is a preprocessor for the Icon language.  Ipp has many operations and
features that are unique to the Icon environment and should not be used as
a generic preprocessor (such as m4).  Ipp produces output which when written
to a file is designed to be the source for icont, the command processor for
Icon programs.
____________________________________________________________

Ipp may be invoked from the command line as:

  ipp [option  ...] [ifile [ofile]]

   Two file names may be specified as arguments.  'ifile' and 'ofile' are
respectively the input and output files for the preprocessor.  By default
these are standard input and standard output.  If the output file is to be
specified while the input file should remain standard input a dash ('-')
should be given as 'ifile'.  For example, 'ipp - test' makes test the output
file while retaining standard input as the input file.

   The following special names are predefined by ipp and may not be
redefined #  or undefined.  The name _LINE_ is defined as the line number
(as an integer) of the line of the source file currently processed.  The
name _FILE_ is defined as the name of the current source file
(as a string).  If the source is standard input then it has the value
'stdin'.

   Ipp will also set _LINE_ and _FILE_ from the "#line" directives it
encounters, and will insert line directives to indicate source origins.

   Also predefined are names corresponding to the features supported by the
implementation of Icon at the location the preprocessor is run.  This allows
conditional translations using the 'if' commands, depending on what features
are available.  Given below is a list of the features on a 4.nbsd UNIX
implementation and the corresponding predefined names:

     Feature                         Name
     -----------------------------------------------------
     UNIX                            UNIX
     co-expressions                  co_expressions
     overflow checking               overflow_checking
     direct execution                direct_execution
     environment variables           environment_variables
     error traceback                 error_traceback
     executable images               executable_images
     string invocation               string_invocation
     expandable regions              expandable_regions


Command-Line Options:
---------------------

  The following options to ipp are recognized:

 -C          By default ipp strips Icon-style comments.  If this option
             is specified all comments are passed along except those
             found on ipp command lines (lines starting with  a '$'
             command).

 -D name
 -D name=def Allows the user to define a name on the command line instead
             of using a $define command in a source file.  In the first
             form the name is defined as '1'.  In the second form name is
             defined as the text following the equal sign.  This is less
             powerful than the $define command line since def can not
             contain any white space (spaces or tabs).

 -d depth    By default ipp allows include files to be nested to a depth
             of ten.  This allows the preprocessor to detect infinitely
             recursive include sequences.  If a different limit for the
             nesting depth is needed it may changed by using this option
             with an integer argument greater than zero. Also, if a file
             is found to already be in a nested include sequence an
             error message is written regardless of the limit.

 -I dir      The following algorithm is normally used in searching for
             $include files.  On a UNIX system names enclosed in "" are
             searched for by trying in order the directories specified by the
             PATH environment variable, and names enclosed in <> are always
             expected to be in the /usr/icon/src directory.  On other systems
             names enclosed in <> are searched for by trying in order the
             directories specified by the IPATH environment variable; names
             in "" are searched for in a similar fashion, except that the
             current directory is tried first.  If the -I option is given the
             directory specified is searched before the 'standard'
             directories.  If this option is specified more than once the
             directories specified are tried in the order that they appear on
             the command line, then followed by the 'standard' directories.

Preprocessor commands:
----------------------

   All ipp commands start with a line that has '$' as its first non-space
character.  The name of the command must follow the '$'.  White space
(any number of spaces or tabs) may be used to separate the '$' and the
command name.  Any line beginning with a '$' and not followed by a valid
name will cause an error message to be sent to standard error and
termination of the preprocessor.  If the command requires an argument then
it must be separated from the command name by white space otherwise the
argument will be considered part of the name and the result will likely
produce an error.  In processing the $ commands ipp responds to exceptional
conditions in one of two ways.  It may produce a warning and continue
processing or produce an error message and terminate.  In both cases the
message is sent to standard error.  With the exception of error conditions
encountered during the processing of the command line, the messages normally
include the name and line number of the source file at the point the
condition was encountered.  Ipp was designed so that most exception
conditions encountered will produce errors and terminate.  This protects the
user since warnings could simply be overlooked or misinterpreted.

   Many ipp command require names as arguments.  Names must begin with a
letter or an underscore, which may be followed by any number of letters,
underscores, and digits.  Icon-style comments may appear on ipp command
lines, however they must be separated from the normal end of the command by
white_space.  If any extraneous characters appear on a command line a
warning is issued.  This occurs when characters other than white-space or a
comment follow the normal end of a command.

   The following commands are implemented:

  $define:  This command may be used in one of two forms.  The first form
         only allows simple textual substitution.  It would be invoked as
         '$define name text'.  Subsequent occurrences of name are replaced
         with text.  Name and text must be separated by one white space
         character which is not considered to be part of the replacement
         text.  Normally the replacement text ends at the end of the line.
         The text however may be continued on the next line if the backslash
         character '\' is the last character on the line.  If name occurs
         in the replacement text an error message (recursive textual substi-
         tution) is written.

         The second form is '$define name(arg,...,arg) text' which defines
         a macro with arguments.  There may be no white space between the
         name and the '('.  Each occurrence of arg in the replacement text
         is replaced by the formal arg specified when the macro is
         encountered.   When a macro with arguments is expanded the arguments
         are placed into the expanded replacement text unchanged.  After the
         entire replacement text is expanded, ipp restarts its scan for names
         to expand at the beginning of the newly formed replacement text.
         As with the first form above, the replacement text may be continued
         on following lines.  The replacement text starts immediately after
         the ')'.
         The names of arguments must comply with the convention for regular
         names.  See the section below on Macro processing for more
         information on the replacement process.

  $undef:   Invoked as '$undef name'.   Removes the definition of name.  If
         name is not a valid name or if name is one of the reserved names
         _FILE_ or _LINE_ a message is issued.

  $include: Invoked as '$include <filename>' or '$include "filename"'.  This
         causes the preprocessor to make filename the new source until
         end of file is reached upon which input is again taken from the
         original source.  See the -I option above for more detail.

  $dump:    This command, which has no arguments, causes the preprocessor to
         write to standard error all names which are currently defined.
         See '$ifdef' below for a definition of 'defined'.

  $warning:
            This command issues a warning, with the text coming from the
         argument field of the command.

  $error:   This command issues a error, with the text coming from the
         argument field of the command.  As with all errors, processing
         is terminated.

  $ifdef:   Invoked as 'ifdef name'.  The lines following this command appear
         in the output only if the name given is defined.  'Defined' means
           1.  The name is a predefined name and was not undefined using
               $undef, or
           2.  The name was defined using $define and has not been undefined
               by an intervening $undef.

  $ifndef:  Invoked as 'ifndef name'.  The lines following this command do
         not appear in the output if the name is not defined.

  $if:      Invoked as 'if constant-expression'.  Lines following this
         command are processed only if the constant-expression produces a
         result. The following arithmetic operators may be applied to
         integer arguments: + - * / % ^

         If an argument to one of the above operators is not an integer an
         error is produced.

            The following functions are provided: def(name), ndef(name)
         This allows the utility of $ifdef and $ifndef in a $if command.
         def produces a result if name is defined and ndef produces a
         result if name is not defined.

            The following comparison operators may be used on integer
         operands:

         > >= = < <= ~=

            Also provided are alternation (|), conjunction (&), and
         negation (not).  The following table lists all operators with
         regard to decreasing precedence:

             not + - (unary)
             ^ (associates right to left)
             * / %
             + - (binary)
             > >= = < <= ~=
             |
             &

         The precedence of '|' and '&' are the same as the corresponding
         Icon counterparts.  Parentheses may be used for grouping.
         Backtracking is performed, so that the expression

             FOO = (1|2)

         will produce a result precisely when FOO is either 1 or 2.

  $elif:    Invoked as 'elif constant-expression'.  If the lines preceding
         this command were processed, this command and the lines following
         it up to the matching $endif command are ignored.  Otherwise,
         the constant-expression is evaluated, and the lines following this
         command are processed only if it produces a result.

  $else:    This command has no arguments and reverses the notion of the
         test command which matches this directive.  If the lines preceding
         this command where ignored the lines following are processed, and
         vice versa.

  $endif:   This command has no arguments and ends the section of lines
         begun by a test command ($ifdef, $ifndef, or $if).  Each test
         command must have a matching $endif.

Macro Processing and Textual Substitution
-----------------------------------------
   No substitution is performed on text inside single quotes (cset literals)
and double quotes (strings) when a line is processed.   The preprocessor
will #  detect unclosed cset literals or strings on a line and issue an
error message unless the underscore character is the last character on the
line.  The output from

     $define foo bar
     write("foo")

is

     write("foo")

   Unless the -C option is specified comments are stripped from the source.
Even if the option is given the text after the '#' is never expanded.

   Macro formal parameters are recognized in $define bodies even inside cset
constants and strings.  The output from

     $define test(a)         "a"
     test(processed)

is the following sequence of characters: "processed".

   Macros are not expanded while processing a $define or $undef.  Thus:

     $define off invalid
     $define bar off
     $undef off
     bar

produces off.  The name argument to $ifdef or $ifndef is also not expanded.

   Mismatches between the number of formal and actual parameters in a macro
call are caught by ipp.  If the number of actual parameters is greater than
the number of formal parameters is error is produced.  If the number of
actual parameters is less than the number of formal parameters a warning is
issued and the missing actual parameters are turned into null strings.
____________________________________________________________

  The records and global variables used by ipp are described below:

Src_desc:            Record which holds the 'file descriptor' and name
                     of the corresponding file.  Used in a stack to keep
                     track of the source files when $includes are used.
Opt_rec              Record returned by the get_args() routine which returns
                     the options and arguments on the command line.  options
                     is a cset containing options that have no arguments.
                     pairs is a list of [option,  argument] pairs. ifile and
                     ofile are set if the input or output files have been
                     specified.
Defs_rec             Record stored in a table keyed by names.  Holds the
                     names of formal arguments, if any, and the replacement
                     text for that name.
Expr_node            Node of a parse tree for $if / $elif expressions.
                     Holds the operator, or a string representing the
                     control structure.  Also, holds a list of the args for
                     the operation / control structure, which are either
                     scalars or other Expr_node records.
Chars                Cset of all characters that may appear in the input.
Defs                 The table holding the definition data for each name.
Depth                The maximum depth of the input source stack.
Ifile                Descriptor for the input file.
Ifile_name           Name of the input file.
Init_name_char       Cset of valid initial characters for names.
Line_no              The current line number.
Name_char            Cset of valid characters for names.
Non_name_char        The complement of the above cset.
Ofile                The descriptor of the output file.
Options              Cset of no-argument options specified on the command
                     line.
Path_list            List of directories to search in for "" include files.
Src_stack            The stack of input source records.
Std_include_paths    List of directories to search in for <> include files.
White_space          Cset for white-space characters.
TRUE                 Defined as 1.

Source code | Program Library Page | Icon Home Page