itokens.icn: Procedures for tokenizing Icon code

link itokens
March 3, 1996; Richard L. Goerwitz
Requires: coexpressions
This file is in the public domain.

This file contains itokens() - a utility for breaking Icon source
files up into individual tokens.  This is the sort of routine one
needs to have around when implementing things like pretty printers,
preprocessors, code obfuscators, etc.  It would also be useful for
implementing cut-down implementations of Icon written in Icon - the
sort of thing one might use in an interactive tutorial.

Itokens(f, x) takes, as its first argument, f, an open file, and
suspends successive TOK records.  TOK records contain two fields.
The first field, sym, contains a string that represents the name of
the next token (e.g. "CSET", "STRING", etc.).  The second field,
str, gives that token's literal value.  E.g. the TOK for a literal
semicolon is TOK("SEMICOL", ";").  For a mandatory newline, itokens
would suspend TOK("SEMICOL", "\n").

Unlike Icon's own tokenizer, itokens() does not return an EOFX
token on end-of-file, but rather simply fails.  It also can be
instructed to return syntactically meaningless newlines by passing
it a nonnull second argument (e.g. itokens(infile, 1)).  These
meaningless newlines are returned as TOK records with a null sym
field (i.e. TOK(&null, "\n")).

NOTE WELL: If new reserved words or operators are added to a given
implementation, the tables below will have to be altered.  Note
also that &keywords should be implemented on the syntactic level -
not on the lexical one.  As a result, a keyword like &features will
be suspended as TOK("CONJUNC", "&") and TOK("IDENT", "features").

Source code | Program Library Page | Icon Home Page