link itokens
March 3, 1996; Richard L. Goerwitz
Requires: coexpressions
This file is in the public domain.
This file contains itokens() - a utility for breaking Icon source files up into individual tokens. This is the sort of routine one needs to have around when implementing things like pretty printers, preprocessors, code obfuscators, etc. It would also be useful for implementing cut-down implementations of Icon written in Icon - the sort of thing one might use in an interactive tutorial. Itokens(f, x) takes, as its first argument, f, an open file, and suspends successive TOK records. TOK records contain two fields. The first field, sym, contains a string that represents the name of the next token (e.g. "CSET", "STRING", etc.). The second field, str, gives that token's literal value. E.g. the TOK for a literal semicolon is TOK("SEMICOL", ";"). For a mandatory newline, itokens would suspend TOK("SEMICOL", "\n"). Unlike Icon's own tokenizer, itokens() does not return an EOFX token on end-of-file, but rather simply fails. It also can be instructed to return syntactically meaningless newlines by passing it a nonnull second argument (e.g. itokens(infile, 1)). These meaningless newlines are returned as TOK records with a null sym field (i.e. TOK(&null, "\n")). NOTE WELL: If new reserved words or operators are added to a given implementation, the tables below will have to be altered. Note also that &keywords should be implemented on the syntactic level - not on the lexical one. As a result, a keyword like &features will be suspended as TOK("CONJUNC", "&") and TOK("IDENT", "features").