Department of Computer
Science The University of Arizona Tucson, Arizona IPD245b April 11, 1996 http://www.cs.arizona.edu/icon/docs/ipd245.htm |
A
to target language B
,
is a popular and effective means of implementingA -> B
A
, given an
implementation of B
. Ratfor [1] is perhaps the best known and
most widely used example of this technique, although there are many others.A
is a variant of B
. An example
is Cg [2], a variant of C that includes a generator facility similar to
Icon's [3]. Cg consists of C and some additional syntax that a preprocessor
translates into standard C. A run-time system provides the necessary semantic
support for generators. Note that the Cg preprocessor is a source-to-source
translator:
where Cg differs from C only in the addition of a few syntactic constructs. This can be viewed as an instance of a more general paradigm:Cg -> C
The term "translator" is used here in the general sense, and includes both source-to-source translators, such as preprocessors, and source-to-object translators, such as compilers. In practice, the application of a source-to-source translator (preprocessor) may be followed by the application of a source-to-object translator (compiler). The combination is, of course, also a translator.A+ -> A
The input text and the output text may be different, but they are both inA -> A
A
. Both the input and the output of the variant translator
can be processed by a standard translator for the target language A
.h/grammar.h
in the directory
in which variant translators are installed. Many variant translators can
be constructed without modifying this grammar, and minor modifications can
be made to it without a detailed knowledge of its structure. Knowledge of
a few aspects of this grammar are important, however, for understanding
the translation process.The non-terminalsdecl : record {Recdcl($1);} ; | proc {Procdcl($1);} ; | global {Globdcl($1);} ; | link {Linkdcl($1);} ; | invocable {Invocdcl($1);} ;
record
, proc
, global
,
link
, and invocable
each produce a string and
the corresponding macro Recdcl
, Procdcl
, Globdcl
,
Linkdcl
, or Invocdcl
prints the string.The macroglobal : GLOBAL {Global0($1);} idlist {Global1($1, $2, $3);} ;
Global0
is needed in the regular translator, but
performs no operation in the identity translator. The macro Global1
does the work in the identity translator; it concatenates "global "
with the string produced by idlist
, and this new string becomes
the result of this production. The macro Global1
is passed
$1
, $2
, and $3
even though it only
uses $3
. This is done for generality.Global
for the identity translator is:
The variable#define Globdcl(x) if (!nocode) treeprt(x); treefree(x)
nocode
is set when an error is detected during
parsing. This helps prevent the variant translator from generating a program
with syntax errors. The reason for doing this is that the output of a variant
translator is usually piped directly into the regular Icon translator. If
syntax errors were propagated, two error messages would result: one from
the variant translator and one from the Icon translator. The message from
the variant translator is the one that is wanted because it references the
line number of the original source whereas the message from the Icon translator
references the line number of the generated source.treeprt()
prints a string and the function treefree()
reclaims storage. See the Section 5 for details of string representation.
While1
is used in the rule
A specification for this macro to produce an identity translation is:WHILE expr DO expr {While1($1, $2, $3, $4);} ;
Tabs separate the components of the specification. The first component is the prototype for the macro call, which may include optional arguments enclosed in parentheses as illustrated by the example above. The remaining components are the strings to be concatenated with the result being assigned to the Yacc pseudo-variableWhile1(w, x, y, z) "while " x " do " z
$$
.#
or which are empty are
treated as comments. A set of lines delineated by %{
and %}
is copied unchanged. The "braces" %{
and %}
must each occur alone on a separate line; these two delimiting lines are
not copied. This feature allows the inclusion of actual macro definitions,
as opposed to specifications, and the inclusion of C definitions. The standard
macro definitions supplied for the identity translator include examples
of these features. These definitions are the file ident.defs
.deletes the definition forWhile1()
While1
. This is a convenient way
to insure a macro is undefined. It is usually used along with the copy feature
to introduce macro definitions that cannot be generated by the specification
system. For example, the following specifications eliminate reclamation
of storage, preserving strings between declarations:
Globdcl() Linkdcl() Procdcl() Recdcl() %{ #define Globdcl(x) if (!nocode) treeprt(x); #define Linkdcl(x) if (!nocode) treeprt(x); #define Procdcl(x) if (!nocode) treeprt(x); #define Recdcl(x) if (!nocode) treeprt(x); %}
Blim(x,
y, z)
is the macro for a limitation expression,
Note that the parameterexpr1 \ expr2
y
is the operator symbol itself. To
avoid having to know the names of the macros for the operators, specifications
allow the use of operator symbols in prototypes. The symbols are automatically
replaced by the appropriate names. Thus
can be used in a specification in place of\(x, y, z)
Unary operators are similar. For example,Blim(x, y, z)
Uqmark(x, y)
, which
is the macro for ?expr
, can be specified as ?(x, y
).
In this case the parameter x
is the operator symbol.<type>
indicates a category of operators. The categories are:
The category<uop> unary operators, except as follows <ucs> control structures in unary operator format <bop> binary operators, except as follows <aop> assignment operators <bcs> control structures in binary operator format
<ucs>
consists only of |
. The
category <bcs>
consists of ?
, |
,
!
, and \
.This specification results in the definition for every binary operator:<bop>(x, y, z) x " <bop> " z
+(x, y, z)
, -(x, y, z)
, and so on. In such a specification,
every occurrence of <bop>
is replaced by the corresponding
operator symbol. Note that blanks are necessary to separate the binary operator
from its operands. Otherwise,
would be translated intoi * *s
which is equivalent toi**s
The division of operators into categories is based on their semantic properties. For example, a preprocessor may translate all unary operators in the same way, but translate the repeated alternation control structure into a programmer-defined control operation [9].i ** s
IDENT
,
STRINGLIT
, INTLIT
, REALIT
, and CSETLIT
.
These nodes contain pointers to the strings recognized. (The actual strings
are stored in a string space and remain there throughout execution of the
translator.) These nodes can be used directly as a tree (of one node) of
strings. Other nodes produced by the lexical analyzer, for example those
for operators, do not contain strings. However, all of these nodes contain
line and column numbers referring to the location of the token in the source
text. This line and column information can be useful in variant translators
that need to produce output that contains position information from the
input.This is handled automatically when macros are produced from specifications. For example, the specificationq(s)
is translated into the macroFail(x) "fail"
Most semantic actions concatenate two or more strings and produce a string. They use the C function#define Fail(x) $$ = q("fail")
which takes a variable number of arguments and returns a pointer to the concatenated result. The first argument is the number of strings to be concatenated. The other arguments are the strings in tree format. The result is also in tree format.cat(n, t1, t2, ..., tn)
produces the definitionWhile1(w, x, y, z) "while " x " do " z
Another function,#define While1(w, x, y, z) $$ = cat(4, q("while "), x, q(" do "), z)
item(t, n)
, returns the n
th
node in the "list" t
. For example, the name of a
procedure is contained in the second node in the list for the procedure
declaration. Thus, if the procedure heading list is the value of head, item(head,
2)
produces the procedure name.
Str0()
produces the string. For example, code conditional on the
main procedure could be written as follows:
As this example illustrates, semantic actions may be too complicated to be represented conveniently by macros. In such cases parser functions can be used. A file is provided for such functions. See Section 9 for an example.if (strcmp(Str0(item(head, 2)), "main") == 0) { . . . }
Line
and Col
produce the source-file
line number and column, respectively, of the place where the text for the
node begins. The use of these attributes is illustrated in Section 9.AppChar(lex
sbuf, c)
appends a character to the lexical analyzer's string buffer
and the function str_install(&lex_sbuf)
saves the string
in a string table. The use of such facilities requires more knowledge of
the translator system than it is practical to provide here. Persons with
special needs should study the translator in more detail.
common/tokens.txt
contains a list of
all reserved words and operator symbols. Each symbol has associated flags
that indicate whether it can begin or end an expression. These flags are
used for semicolon insertion.tokens.txt
and give it a new token
name. To add a new operator, insert it in the list of operators in tokens.txt
(order there is not important) and give it a new token name. The new token
names must be added to the grammar.comon/op.txt
. Its structure is
straightforward.
icon_vt
supplied with Version
9 of Icon
xtran
and Icon is installed in /usr/icon/bin
, the following commands
will build the variant translator:
The shell scriptmkdir xtran cd xtran /usr/icon/bin/icon_vt
icon_vt
creates a number of files in the new
directory and in three sub-directories: common
, itran
,
and h
. Unless changes to the lexical analyzer are needed, at
most three files need to be modified to produce a new translator:
A non-emptyvariant.defs variant macro definitions (initially empty) variant.c parser functions (initially empty) h/grammar.h Yacc grammar for Icon
variant.c
usually requires #include
files to provide needed declarations and definitions. See the example that
follows.Makefile
in the main translator directory just insures
that the program define
has be compiled and then does a make
in the itran
directory. Performing a make
in the
itran
directory first combines variant.defs
with
the standard macro definitions (in ident.defs
) and processes
them to produce the definition file, itran/ident.h
. The C preprocessor
is then used to expand the macros in h/grammar.h
using these
definitions and the result, after some "house keeping", is put
in itran/vgram.g
. Next, Yacc uses the grammar in itran/vgram.g
to build a new parser, tparse.c
. There are over 200 shift/reduce
conflicts in the identity translator. All of these conflicts are resolved
properly. More conflicts should be expected if additions are made to the
grammar. Reduce/reduce conflicts usually indicate errors in the grammar.
Finally, all the components of the system are compiled, including variant.c
,
and linked to produce vitran
, the variant translator.itran/tparse.c
refer to line numbers in itran/vgram.g
. These errors must be
related back to variant.defs
or h/grammar.h
by
inspection of itran/vgram.g
.
vitran
, takes an input file on the command
line and translates it. The specification -
in place of an
input file indicates standard input. The output of vitran
is
written to standard output. For example,
translates the filevitran pre.icn >post.icn
pre.icn
and produces the output in
post.icn
. The suffix .icn
on the argument to vitran
is optional; the example above can be written as:
Assuming the variant translator produces Icon source language,vitran pre >post.icn
post.icn
can be translated into object code by
whereicont post.icn
icont
is the standard Icon command processor.The-m process input file with the macro processor m4 before translation -s suppress informative messages -P do not generate #line directives
-P
option may be necessary to prevent the insertion of
#line
directives at places that result in syntactically erroneous
output.
The procedureexpr1 || expr2 -> Cat(expr1, expr2)
Cat()
might have the form:
Such a procedure could be added to a preprocessed program (procedure Cat(s1, s2) write(&errout, "concatenation: ", *s1 + *s2, " characters") return s1 || s2 end
Cat()
is not preprocessed itself) in order to produce the desired information
when the program is run.variant.defs
suffices:
Note, however, that Icon also has an augmented assignment operator for string concatenation:||(x, y, z) "Cat(" x "," z ")"
This operation can be handled by the definitionexpr1 ||:= expr2
Observe that this definition is not precisely faithful to the semantics of Icon, since it causes||:=(x, y, z) x " := Cat(" x "," z ")"
expr1
to be evaluated twice,
while expr1
is evaluated only once in the true augmented
assignment operation. This problem cannot be avoided here, since all arguments
are passed by value in Icon, but in practice, this discrepancy is unlikely
to cause problems.Adding a new operator to the syntax of Icon requires modifying the grammar inexpr1 ~ expr2 -> expr1 || expr2
h/grammar.h
. Since this alternative concatenation operator
should have the same precedence and associativity as the regular concatenation
operator, it can be added to the definition of expr5
:
whereexpr5 : expr6 ; | expr5 CONCAT expr6 {Bcat($1,$2,$3);} ; | expr5 TILDE expr6 {Bacat($1,$2,$3);} ; | expr5 LCONCAT expr6 {Blcat($1,$2,$3);} ;
TILDE
is the token name for ~
. Then the
definition of Bacat()
can be added to variant.defs
:
Such changes toBacat(x, y, z) x " || " z
grammar.h
usually increase the number of shift/reduce
conflicts encountered by Yacc.Cat(
) must be added to the translated program.
This can be accomplished automatically by arranging to have the code for
Cat()
written out when the variant translator encounters the
main procedure. This is a case where a parser function, as mentioned in
Section 5, is more appropriate than a macro definition.Proc1
, that produces procedure declarations is replaced by
a call to a parser function. The changes to variant.defs
are:
The C declaration for%{ nodeptr proc(); %} Proc1(u, v , w, x, y, z) proc(u, w, x, y)
proc()
is included in the file vgram.g
and subsequently incorporated by Yacc into tparse.c
where the
call to proc()
is compiled. Note that proc()
returns
a nodeptr
.variant.c
. It might have the form
Thus, when the main procedure is encountered, the text for#include "h/config.h" #include "itran/tree.h" #include "itran/tproto.h" nodeptr item(), cat(), q(); nodeptr proc(u, w, x, y) nodeptr u, w, x, y; { static char *catproc = "procedure Cat(s1, s2)\n\ write(&errout,\"concatenation: \", *s1 + *s2,\" characters\")\n\ return s1 || s2\n\ end\n"; if (strcmp(Str0(item(u, 2)), "main") == 0) return cat(7, q(catproc), u, q(";\n"), w, x, y, q("end\n")); else return cat(6, u, q(";\n"), w, x, y, q("end\n")); }
Cat()
is written out before the text for the main procedure, but all other procedures
are written out as they would be in the absence of this function.Cat()
is that the literal string is long, complicated, and difficult to change.
In addition, it is necessary to rebuild the variant translator in order
to change Cat()
. Since monitoring of this kind is likely to
suggest changes to the format or nature of the data being written, it is
useful to be able to change Cat()
more easily. One solution
to this problem is to produce a link declaration for the file containing
the translated procedure rather than the text of the procedure. With this
change, the parser function might have the form
The monitoring facility described above produces information about all string concatenation operations, but it is not possible to distinguish among them. It might be more useful to know the amount of concatenation performed by each concatenation operation. This can be done if the location of the operator in the source program can be identified. As mentioned in Section 5, tree nodes contain line and column information provided by the lexical analyzer. Thus, the translation for the concatenation operations could provide this addition information as extra arguments tonodeptr proc(u, w, x, y) nodeptr u, w, x, y; { if (strcmp(Str0(item(u, 2)), "main") == 0) return cat(7, q("link cat\n\n"), u, q(";\n"), w, x, y, q("end\n")); else return cat(6, u, q(";\n"), w, x, y, q("end\n")); }
Cat(
), which then
could print out the locations along with information about the amount of
concatenation.
The specifications for the translation of the concatenation operations might be changed toprocedure Cat(s1, s2, i, j) write(&errout, "concatenation: ", *s1 + *s2, " characters at [", i, ",", j, "]") return s1 || s2 end
where%{ nodeptr proc(), Locargs(); %} Proc1(u, v, w, x, y, z) proc(u, w, x, y) ||(x, y, z) "Cat(" x "," z Locargs(y) ")" ||:=(x, y, z) x " := Cat(" x "," z Locargs(y) ")" Bacat(x, y, z) x " || " z
Locargs()
is a parser function that produces a string
consisting of the line and column numbers between commas. This function
might have the form
The C functionnodeptr Locargs(x) nodeptr x; { char buf[25]; char *s; sprintf(buf, ",%d,%d", Col(x), Line(x)); for (s = buf, s != 0, ++s) AppChar(lex sbuf, *s); return q(str_install(&lex sbuf)); }
sprintf()
is used to do the formatting. The
resulting string is copied into the translator's string buffer as mentioned
in Section 5. The string is installed by str_install()
, which
adds '\0'
to null-terminate the string.