csgen.icn: Program to generate context-sensitive sentences

November 19, 1997; Ralph E. Griswold
This file is in the public domain.
   This program accepts a context-sensitive production grammar
and generates randomly selected sentences from the corresponding
language.

   Uppercase letters stand for nonterminal symbols and -> indi-
cates the lefthand side can be rewritten by the righthand side.
Other characters are considered to be terminal symbols. Lines
beginning with # are considered to be comments and are ignored.
A line consisting of a nonterminal symbol followed by a colon and
a nonnegative integer i is a generation specification for i
instances of sentences for the language defined by the nontermi-
nal (goal) symbol.  An example of input to csgen is:

        #   a(n)b(n)c(n)
        #   Salomaa, p. 11.
        #   Attributed to M. Soittola.
        #
        X->abc
        X->aYbc
        Yb->bY
        Yc->Zbcc
        bZ->Zb
        aZ->aaY
        aZ->aa
        X:10

The output of csgen for this example is

        aaabbbccc
        aaaaaaaaabbbbbbbbbccccccccc
        abc
        aabbcc
        aabbcc
        aaabbbccc
        aabbcc
        abc
        aaaabbbbcccc
        aaabbbccc


   A positive integer followed by a colon can be prefixed to a
production to replicate that production, making its selection
more likely. For example,

        3:X->abc

is equivalent to

        X->abc
        X->abc
        X->abc

One option is supported:

     -g i    number of derivations; overrides the number specified
             in the grammar

Limitations: Nonterminal symbols can only be represented by sin-
gle uppercase letters, and there is no way to represent uppercase
letters as terminal symbols.

   There can be only one generation specification and it must
appear as the last line of input.

Comments: Generation of context-sensitive strings is a slow pro-
cess. It may not terminate, either because of a loop in the
rewriting rules or because of the progressive accumulation of
nonterminal symbols.  The program avoids deadlock, in which there
are no possible rewrites for a string in the derivation.

   This program would be improved if the specification of nonter-
minal symbols were more general, as in rsg.

Source code | Program Library Page | Icon Home Page