Programming Corner from Icon Newsletter 25
November 1, 1987 ; Icon Version 6
Pattern Words
In the last newsletter, we posed the problem of writing a procedure patwords(s)
that returns the pattern word for s
. (A pa-tern word is obtained
by replacing all occurrences of the first letter of a word by A
,
the next different letter by B
, and so on.)
Experienced Icon programmers naturally turn to map(s1,s2,s3)
when there is a hint of character substitution or rearrangement in the air.
The problem above is a natural for map
-- the difficulty is
finding the unique characters on which to base a substitution.
Most solutions we received were based on removing duplicate characters in
the word by "conventional" string processing, followed by a straightforward
use of map. Here's one based on a submission by Gregg Townsend:
procedure patword(s) # Solution 1
static letters
initial letters := string(&lcase)
out := "" every c := !s do
if not find(c,out) then out ||:= c
return map(s, out, letters[1+:*out]))
end
The static identifier is used to avoid cset-to-string conversion every time
the procedure is called. This makes a noticeable difference, as does the
use of find
instead of upto
-- the latter requires
a string-to-cset conversion in the loop.
Ardent Icon programmers try to find a way to do it entirely with map
-- both because of the challenge and because of the knowledge that map
operates on all characters of its arguments in each call, avoiding loops
over the characters at the source level.
Here is such a solution, based on a submission by Ken Walker:
procedure patword(s) # Solution 2
local numbering, orderS, orderset, patlbls
static labels, revnum
initial {
labels := &lcase || &lcase
revnum := reverse(&cset)
}
# 1: Map each character of s into another character, such
# that the new characters are in increasing order left
# to right (note that the map function chooses the
# rightmost character of its second argument so things
# must be reversed).
# 2: Map each of these new characters into contiguous
# letters.
(numbering := revnum[1 : *s + 1]) |
stop("word too long")
orderS := map(s, reverse(s), numbering)
rderset := string(cset(orderS))
(patlbls := labels[1 : *orderset + 1]) |
stop("too many characters")
return map(orderS, orderset, patlbls)
end
Yet another all-map
solution is:
procedure patword(s) # Solution 3
static backwards, letters
initial {
backwards := reverse(&ascii)
letters := string(&lcase)
}
z := reverse(s)
z := map(z,z,backwards[1:*z + 1])
cz := cset(z)
return reverse(map(z,cz,map(cz,cz,
letters[1:*cz + 1])))
end
We'll leave you to figure this one out.
The concept of "good style" is more controversial in Icon than
in many other programming languages. We personally prefer the first solution
for clarity and the second for cleverness. Timing is more objective. Solution
2 is fastest. Here are comparative timings from processing 10,000 words
from the word list from Webster's 2nd. The results are normalized to Solution
2:
Solution 1 1.15
Solution 2 1.00
Solution 3 1.51
It's worth noting that the timing is very sensitive to the method used.
Solutions that use table-lookup instead of mapping typically are three to
five times slower than Solution 2. Even putting the cset-to-string conversion
in the body of the procedure in Solution 1 adds about 20% to its running
time.
Environment Variables
The last Newsletter also asked for a procedure getenv(s)
for
UNIX that returns the value of the environment variable s
if
it's set but fails otherwise.
The solutions we got varied somewhat, depending on the flavor of UNIX involved.
The approach we liked best, used by Mike Beede and Dave Hanson, is to read
in all of the environment variables the first time getenv
is
called. Here's a combination of their submissions for Berkeley UNIX:
procedure main()
while s := read() do
write(getenv(s))
end
procedure getenv(s)
local pipe, line
static environment
initial {
environment := table()
pipe := open("printenv","pr")
while line := read(pipe) do
line ? environment[tab(upto('='))] :=
(move(1),tab(0))
close(pipe)
}
return \environment[s]
end
Benchmarking Icon Expressions
In the last Newsletter, we talked a little about efficient programming in
Icon. There are several problems in writing efficient programs in Icon.
One problem is that Icon often provides many different ways of accomplishing
a task, and it's often difficult to tell which is the most efficient. Another
problem is that many of Icon's features do not have any direct counterpart
in the architecture of the computers on which Icon runs. You can guess how
a feature like table-lookup is implemented, but you may guess wrong. Even
if you know, you may be mistaken about its speed. In other cases, features
may be implemented in ways that you'd not expect. Even if you are an expert
on the implementation, you may be surprised by how relatively fast or slow
some operations are. Even those of us who did the implementation frequently
are wrong about speed.
In theory, if you knew enough about the implementation, you could come up
with analytic results -- formulas, bounds, and so forth. In practice, this
approach has limited usefulness because of the complexity of the problems
and the sensitivity of the operations to the kinds of data on which they
operate. An alter-native is an empirical approach: measurement of different
kinds of expressions or small program segments. Such measurements can help
answer questions, pinpoint potential problems, and suggest the most efficient
approach to a particular problem.
Benchmarking expression evaluation isn't difficult. We have developed a
few tools for translating expressions of interest into programs that time
their evaluation in loops and report the results in a convenient format.
To see how helpful benchmarking can be, consider the subscripting of a list:
a[i]
. This expression is simple enough, and in a more conventional
programming language you probably would have a good idea of how it's implemented.
But lcon supports stack and queue access as well as positional access, so
you might guess that positional access is not as simple as it appears.
In fact, how fast positional access is depends on how the list is constructed.
This is a case where benchmarking is useful. Consider two 1,000-element
lists, a1
and a2
, constructed as follows:
a1 := list(l000)
a2 := []
every 1 to 1000 do put(a2,&null)
Benchmarking shows the following comparative times for referencing the first
and last elements of the two lists:
a1[1] 0.1084
a1[1000] 0.1084
a2[1] 0.1084
a2[1000] 0.2914
The time for referencing the first and last elements of a1
is the same, as might be expected. But why should it take longer to access
the last element of a2
? Both lists have the same size and the
same values, but because of the way lists are implemented in lcon, they
do not have the same structure. The first consists of a single block of
1,000 elements, while the second consists of a doubly-linked list of smaller
blocks. The extra time to access the last element of a2
is
a consequence of chaining through the blocks to get to the last one. (See
the Icon implementation book for details.)
Question: How long do you think a1[1001]
and a2[1001]
take? Suppose you build a list by adding elements in the fashion above but
later need to access the elements by position? Is there anything you can
do to eliminate the referencing overhead that results from the piecemeal
construction of the list?
Primes
Andrew Appel's "what does it do?" submission in the last newsletter
provoked this response from Bill Griswold:
Here is a sequence of modifications of the prime program in
the last Newsletter. The first program is the original submission. After
these are modifications that attempt to restrict the range of iteration
for checking if the current number is composite. Although the latter programs
look more cumbersome (a lot of co-expression creation and invocation), they
are actually competitive in speed for large numbers of primes. This is probably
due to the fact that the sieve is more discriminating than the simpler programs.
(You can also contrast these programs with the sieve in sample programs
distributed with UNIX Icon systems, which isn't lazy. It is much faster
than any of these.)
procedure main()
every write((i := 2) | (|i := i + 1) &
(not(i = (2 to i) * (2 to i))) & i)
end
procedure main()
every write(i := seq(2) &
(not(i = (2 to i) * (2 to i))) & i)
end
procedure main()
every write(i := seq(2) &
(not(i = (l := (2 to i^0.5)) *
(l to i))) & i)
end
procedure main()
every write(i := seq(2) &
(not (i = (k := 2 to i/(2))*
(i/(k)))) & i)
end
This program produces the primes lazily using the basic sieve technique
and co-expressions. Nested creation of co-expressions eats space and causes
(co-expression?) stack overflow. This is a modified version of a program
by Robert Henry, and is a translation of a Miranda program.
procedure main(arglist)
every write(sieve(create seq(2)))
end
procedure modcheck(s, p)
repeat if ((x := @s) % p) ~= 0
then suspend x
end
procedure sieve(e)
suspend (p := @e)
every suspend sieve(create modcheck(e, p)) end
procedure main(arglist)
every write(sieve(create seq(2)))
end
Here's a modified version of the sieve program. Elimination of tail recursion
in the sieve lets the program run longer than the more straightforward version
of this program.
procedure main()
every write(sieve(create seq(2)))
end
# Check that every value in s is relatively
# prime to p
procedure modcheck(s, p)
while x := @s do if (x % p) ~= 0
then suspend x
end
# Sieve of Eratothsenes, sans tail recursion
procedure sieve(e)
repeat {
suspend p := @e
e := create modcheck(e, p)
}
end
And here's the one from the distributed sample programs:
# This program illustrates the use of sets in
# implementing the classical sieve algorithm
# for computing prime numbers.
procedure main(arglist)
local limit, s, i
limit := arglist[1] | 100
s := set([])
every member(s,i := 2 to limit) do
every delete(s,i + i to limit by i)
primes := sort(s)
write("There are ",*primes,
" primes in the first ",limit,"
integers.")
write("The primes are:")
every write(right(!primes,*limit + 1))
end
Queens Never Die
The non-attacking n-queens problem continues to fascinate persons who are
interested in program structure. Here's a recent contribution by Paul Abrahams,
based on an earlier program by Steve Wampler:
global n, solution
procedure main(args)
local i
n := args[1] | 8 # 8 queens by default
if not(0 < integer(n)) then stop("usage [ n ]")
solution := list(n) # a list of column solutions
write(n,"-Queens:")
every show(q(1)) # show the result
end
# q(c) -- place queens in columns c through n in all
# possible ways. Suspend with a list of row positions
# for columns c through n
procedure q(c)
local r
static up, down, rows
initial {
up := list(2*n-1,0)
down := list(2*n-1,0)
rows := list(n,0)
}
every (r := 1 to n,if 0 = rows[r] = up[n+(r-c)] =
down[r+c-1] then rows[r] <- up[n+(r-c)] <-
down[r+c-1] <- 1) do suspend {
if c = n then [r] else [r] ||| q(c + 1)
}
end
# Show the solution on a chess board. The argument is
# a list of row positions for columns 1 through n.
procedure show(solution)
static count, line, border
initial {
count := 0
line := repl("| ",n) || "|"
border := repl("----",n) || "-"
}
write("solution: ", count+:=1)
write(" ", border)
every line[4*(!solution -- 1) + 3] <- "Q" do {
write(" ", line)
write(" ", border)
}
write()
end
Icon home page