
Programming Corner from Icon Newsletter 20
January 24, 1986 ; Icon Version 5
String Scanning
The string scanning expression in Icon,
expr1 ? expr2
is often regarded by programmers as somewhat of a mystery. This expression
is technically a control structure, since it cannot be cast as a procedure
call. The reason for this is that actions must be taken after expr1
is evaluated but before expr2
is evaluated, while in a procedure
call, all arguments are evaluated before the procedure gains control.
The actions taken between the evaluation of expr1
and expr2
relate to the global variables &subject
and &pos
.
These "state variables" for scanning are set to the string value
of expr1
and 1, respectively. If these variables were not set
before expr2
was evaluated, expr2
could not "operate
on" the value of expr1
.
As mentioned above, &subject
and &pos
are global variables and their values are accessible throughout the program.
They are not affected by procedure calls. If they were, it would not be
possible to write "matching procedures" to extend the built-in
repertoire of matching functions (tab
and move
).
A string scanning expression, however, cannot just set the values of these
variables. In order for nested scanning and scanning in mutual evaluation
to work properly, the current values of the state variables must be saved
before new values are set and then restored when the scanning expression
is complete. Thus, the scope of the scanning variables can be thought of
as being dynamic with respect to scanning expressions.
Although the string scanning expression cannot be cast as a procedure call,
it can be cast as nested procedure calls:
expr1 ? expr2 -> Escan(Bscan(expr1),expr2)
In the nested call, expr1
is evaluated first. Bscan
is then called before expr2
is evaluated. Thus, Bscan
can manipulate the state variables before expr2
is evaluated,
first saving the current "outer" values, then setting the new
"inner" values for expr2
to operate on. If expr2
produces a result, Escan
is called. It restores the outer values
before producing the result provided by expr2
. On the other
hand, if expr2
fails, Bscan is resumed and can restore the
outer values before it, too, fails.
Note that Escan
must have access to the outer values saved
by Bscan
. This is accomplished by passing these values as the
result produced by Bscan
. Since there are two values, a record
can be used. Procedures to model the string scanning expression in this
fashion are:
record ScanEnvir(subject,pos)
procedure Bscan(e1)
local OuterEnvir
OuterEnvir := ScanEnvir(&subject,&pos)
&subject := e1
&pos := 1
suspend OuterEnvir
&subject := OuterEnvir.subject &pos := OuterEnvir.pos
fail
end
procedure Escan(OuterEnvir,e2)
local InnerEnvir
InnerEnvir := ScanEnvir(&subject,&pos)
&subject := OuterEnvir.subject
&pos := OuterEnvir.pos
suspend e2
&subject := InnerEnvir.subject
&pos := InnerEnvir.pos
fail
end
Note that if expr2
produces a result, Escan
suspends.
If it is resumed, Escan
restores the inner values and fails,
forcing expr2
to be resumed. This allows expr2
to produce a sequence of results.
This is all there is to the string scanning expression -- the maintenance
of state variables. All of string analysis and "pattern matching"
comes from matching functions in expr2
that examine &subject
and change &pos
. These functions are simple. For example,
tab(i)
can be modeled as a procedure as follows:
procedure tab(i)
suspend .&subject[.&pos:&pos <- i]
end
This leaves the question of where all of the power of string scanning comes
from. It derives from the expression evaluation mechanism of Icon: generators
and goal-directed evaluation.
Exercises:
1. There is a slight flaw in the procedures given above as a model for string
scanning. What is it? Hint: It has nothing to do with the maintenance of
state variables.
2. Write a model of string scanning that uses co-expressions and a programmer-defined
control operation Scan{expr1,expr2}
in place of expr1
? expr2
.
3. Why are the explicit dereferencing operations necessary in the procedure
for tab
?
A Programming Idiom
Version 5.10 of Icon has a new sort option for tables that produces a single
list of alternating entry and assigned values. For example,
a := sort(t,3)
assigns such a list to a
. This option is more convenient and
much more efficient in many cases than the standard sort option
a := sort(t,1)
that assigns to a a list of two-element lists.
Consider the problem of writing the entry and assigned values of a table,
side-by-side in two columns, using the new option. The obvious approach
is
a := sort(t,3)
every i := 1 to *a - 1 by 2 do
write(a[i],"\t",a[i + 1])
Students in our class on string and list processing techniques came up with
a different approach:
a := sort(t,3)
while write(get(a),"\t",get(a))
To be sure, this approach "destroys" the list a
,
but that normally makes no difference. And not only is this approach simpler
than the "obvious" one, it is faster too!
Exercise: The elements produced by sort(t,3)
are in
increasing order of the entry values. How could the approach above be modified
to write the output in descending order of the entry values?
Teasers
Steve Wampler provides two more teasers. Suppose that
t := table([])
Also suppose that getword is a procedure that produces a word from a line
of text taken from a file in which lineno
is the current line
number.
1. Why does the following program segment never increase the size of the
table t
?
while word := getword() do put(t[word]),lineno)
2. Why does the following program segment increase the size of the table
when the previous one does not?
while word := getword() do t[word] |||:= [lineno]
Other Exercises
1. What is the upper bound on the number of results that find(s1,s2)
can produce?
2. The two code segments
return x
and
suspend x
fail
are often thought of as being operationally equivalent. Is this really true?
If not, explain the difference.
3. It is frequently claimed that the outcome of looping expressions such
as
while expr1 do expr2
is failure -- that is, that the loop itself produces no results. Is this
always true? If not, give a counter example.
Trivia Corner
Write the shortest possible Icon program whose translation produces at least
one instance of every different ucode instruction in Icon's intermediate
language. Interpret "shortest possible" to mean the fewest number
of characters. (See the Icon implementation book for a discussion of ucode.)
Icon home page