November 19, 1997; Ralph E. Griswold
This file is in the public domain.
This program accepts a context-sensitive production grammar
and generates randomly selected sentences from the corresponding
language.
Uppercase letters stand for nonterminal symbols and -> indi-
cates the lefthand side can be rewritten by the righthand side.
Other characters are considered to be terminal symbols. Lines
beginning with # are considered to be comments and are ignored.
A line consisting of a nonterminal symbol followed by a colon and
a nonnegative integer i is a generation specification for i
instances of sentences for the language defined by the nontermi-
nal (goal) symbol. An example of input to csgen is:
# a(n)b(n)c(n)
# Salomaa, p. 11.
# Attributed to M. Soittola.
#
X->abc
X->aYbc
Yb->bY
Yc->Zbcc
bZ->Zb
aZ->aaY
aZ->aa
X:10
The output of csgen for this example is
aaabbbccc
aaaaaaaaabbbbbbbbbccccccccc
abc
aabbcc
aabbcc
aaabbbccc
aabbcc
abc
aaaabbbbcccc
aaabbbccc
A positive integer followed by a colon can be prefixed to a
production to replicate that production, making its selection
more likely. For example,
3:X->abc
is equivalent to
X->abc
X->abc
X->abc
One option is supported:
-g i number of derivations; overrides the number specified
in the grammar
Limitations: Nonterminal symbols can only be represented by sin-
gle uppercase letters, and there is no way to represent uppercase
letters as terminal symbols.
There can be only one generation specification and it must
appear as the last line of input.
Comments: Generation of context-sensitive strings is a slow pro-
cess. It may not terminate, either because of a loop in the
rewriting rules or because of the progressive accumulation of
nonterminal symbols. The program avoids deadlock, in which there
are no possible rewrites for a string in the derivation.
This program would be improved if the specification of nonter-
minal symbols were more general, as in rsg.
Source code |
Program Library Page |
Icon Home Page