Programming Corner from Icon Newsletter 16
November 12, 1984 ; Version 5
Old Business
Consider the program at the end of the last Newsletter. If line
is not empty, tally(line)
is called and line
is
written. However, since tall
y has no explicit return, it fails
by flowing off the end of the procedure body. Since the call fails, the
argument
("" ~=== line) | "<empty line>")
is resumed, resulting in the call tally("<empty line>")
.
Consequently, <empty line>
is written after every nonempty
line, as well as in place of empty lines.
The cure is simple -- insert a return at the end of the procedure body for
tally
. (What is another way of fixing the problem?) The lesson
is a more general one -- be careful to provide a return
at
the end of a procedure body unless calls of the procedure are supposed to
fail.
Chosing Programming Techniques in Icon
There often are several ways of doing the same thing in Icon. While this
is true of most programming languages, it is exaggerated in Icon, since
its expression evaluation mechanism is more general than the expression
evaluation mechanisms of "Algol-like" languages, such as Pascal
and C. Thus, in Icon, there is often an Algol-like solution and also a solution
that makes use of generators. The fact that Icon has both low-level string
operations and higher-level string scanning complicates the situation.
The novice Icon programmer (and even the more advanced one) is faced with
choices that may be confusing or even bewildering. Most programmers develop
a fixed set of techniques and often fail to use the full potential of the
language.
In the discussion that follows, a simple text processing problem is approached
from a variety of ways to illustrate and compare different programming techniques
in Icon. The problem is to count the number of times the string s
occurs in the file f
, which is formulated in terms of a procedure
scount(s,f)
.
The first attempt at such a procedure might be
# scount1
procedure scount(s,f)
count := 0
while line := read(f) do
while i := find(s,line) do {
count +:= 1 line := line[i + 1:0]
}
return count
end
This solution is very Algol-like in nature and explicitly examines successive
portions of each input line. It does not take advantage of the third argument
of find
, which allows the starting position for the examination
to be specified. Using this feature, the procedure becomes
# scount2
procedure scount(s,f)
count := 0
while line := read(f) do {
i := 1
while i := find(s,line,i) + 1 do
count +:= 1
}
return count
end
While this solution is shorter than the previous one, it is still Algol-like
and uses only low-level string processing. Using string scanning, an alternative
solution is
# scount3
procedure scount(s,f)
count := 0
while line := read(f) do
line ? while tab(find(s) + 1) do
count +:= 1
return count
end
This approach eliminates the auxiliary identifier i
and uses
scanning to move through the string. None of the solutions above uses the
capacity of find
to generate the positions of successive instances
of s
, however. If this capacity is used, it is not necessary
to tab through the string, and the following solution will do:
# scount4
procedure scount(s,f)
count := 0
while line := read(f) do
line ? every find(s) do
count +:= 1
return count
end
At this point, it becomes clear that string scanning provides no advantage
and it can be eliminated in favor of the following solution:
# scount5
procedure scount(s,f)
count := 0
while line := read(f) do
every find(s,line) do
count +:= 1
return count
end
Finally, the hard-core Icon programmer may want to get rid of the while
loop, making use of the fact that the expression !f
generates
the input lines from f
. The solution then becomes
# scount6
procedure scount(s,f)
count := 0
every find(s,!f) do
count +:= 1
return count
end
The relative merits of these different solutions are arguable on stylistic
grounds. Certainly the last (scount6
) is the most concise,
but scount5
is probably easier to understand.
But what about efficiency. Is scount6
more efficient than scount5
?
In fact, how much difference in performance is there among all the solutions?
The relative efficiency varies considerably, depending on the data -- how
many lines there are in f
, how long the lines are, how many
times s
occurs in f
, and so forth. The following
figures are typical, however. The figures are normalized so that the fastest
solution has the value 1.
scount1 2.96
scount2 2.14
scount3 2.03
scount4 1.14
scount5 1.04
scount6 1.00
The fact that the last solution is the fastest may not be surprising --
it is the shortest and uses the features of Icon that are most effective
in internalizing computation. Nor should it be surprising that scount2
is significantly faster than scount1
, since scount2
avoids the formation of substrings. (It is worth noting, however, that substring
formation is relatively efficient in Icon -- no new strings are constructed,
only pointers to portions of old ones.)
It might be surprising, however, to discover that the use of string scanning
in scount3
provides a significant advantage over scount2
.
Evidently, the internalization of the string and position that scanning
provides more than overcomes the fact that scount3
produces
substrings (by tab
).
The real gain in efficiency comes with the use of generators in scount4
,
where the state of the computation is maintained in find for all the positions
in any one line.
Getting rid of string scanning, which serves no useful purpose in scount5
,
produces an expected improvement, although perhaps not as much as might
be expected. The last step of reducing the nested loops to a single loop
in scount6
also produces a slight improvement in performance.
What might be learned from these examples is that there may be a very substantial
difference in performance in Icon, depending on the technique used -- a
factor of nearly 3 between the naive solution of scount1
and
the sophisticated one of scount6
. The value of using the capabilities
of generators is also evident, both in performance and in the conciseness
of the solutions. It is notable that string scanning is not as expensive
as one might imagine. It can be used without the fear that it will degrade
performance substantially.
Different Ways of Looking at Things
Steve Wampler contributes the following interesting note on programming
in Icon:
How do you test to see if the value of i
is not
between 1 and the length of the string s
? I would write:
if not (1 <= i <= *s) then ...
but my students wrote:
if *s < i < 1 then ...
Icon home page