Programming Corner from Icon Newsletter 25

November 1, 1987 ; Icon Version 6

Pattern Words

In the last newsletter, we posed the problem of writing a procedure patwords(s) that returns the pattern word for s. (A pa-tern word is obtained by replacing all occurrences of the first letter of a word by A, the next different letter by B, and so on.)

Experienced Icon programmers naturally turn to map(s1,s2,s3) when there is a hint of character substitution or rearrangement in the air. The problem above is a natural for map -- the difficulty is finding the unique characters on which to base a substitution.

Most solutions we received were based on removing duplicate characters in the word by "conventional" string processing, followed by a straightforward use of map. Here's one based on a submission by Gregg Townsend:

procedure patword(s)   # Solution 1
   static letters

   initial letters := string(&lcase)

   out := "" every c := !s do
      if not find(c,out) then out ||:= c
   return map(s, out, letters[1+:*out]))
end

The static identifier is used to avoid cset-to-string conversion every time the procedure is called. This makes a noticeable difference, as does the use of find instead of upto -- the latter requires a string-to-cset conversion in the loop.

Ardent Icon programmers try to find a way to do it entirely with map -- both because of the challenge and because of the knowledge that map operates on all characters of its arguments in each call, avoiding loops over the characters at the source level.

Here is such a solution, based on a submission by Ken Walker:

procedure patword(s)   # Solution 2
   local numbering, orderS, orderset, patlbls
   static labels, revnum
   initial {
      labels := &lcase || &lcase
      revnum := reverse(&cset)
      }

# 1: Map each character of s into another character, such
# that the  new characters are in increasing order left
# to right (note that the map function chooses the
# rightmost character of its second argument so things
# must be reversed).

# 2: Map each of these new characters into contiguous
# letters.

   (numbering := revnum[1 : *s + 1]) |
      stop("word too long")
   orderS := map(s, reverse(s), numbering)
   rderset := string(cset(orderS))
   (patlbls := labels[1 : *orderset + 1]) |
      stop("too many characters")
   return map(orderS, orderset, patlbls)
end

Yet another all-map solution is:

procedure patword(s)   # Solution 3
   static backwards, letters

   initial {
      backwards := reverse(&ascii)
      letters := string(&lcase)
      }

   z := reverse(s)
   z := map(z,z,backwards[1:*z + 1])
   cz := cset(z)
   return reverse(map(z,cz,map(cz,cz,
     letters[1:*cz + 1])))
end

We'll leave you to figure this one out.

The concept of "good style" is more controversial in Icon than in many other programming languages. We personally prefer the first solution for clarity and the second for cleverness. Timing is more objective. Solution 2 is fastest. Here are comparative timings from processing 10,000 words from the word list from Webster's 2nd. The results are normalized to Solution 2:

Solution 1    1.15
Solution 2    1.00
Solution 3    1.51

It's worth noting that the timing is very sensitive to the method used. Solutions that use table-lookup instead of mapping typically are three to five times slower than Solution 2. Even putting the cset-to-string conversion in the body of the procedure in Solution 1 adds about 20% to its running time.

Environment Variables

The last Newsletter also asked for a procedure getenv(s) for UNIX that returns the value of the environment variable s if it's set but fails otherwise.

The solutions we got varied somewhat, depending on the flavor of UNIX involved. The approach we liked best, used by Mike Beede and Dave Hanson, is to read in all of the environment variables the first time getenv is called. Here's a combination of their submissions for Berkeley UNIX:

procedure main()
   while s := read() do
      write(getenv(s))
end
procedure getenv(s)
   local pipe, line
   static environment

   initial {
      environment := table()
      pipe := open("printenv","pr")
      while line := read(pipe) do
         line ? environment[tab(upto('='))] :=
            (move(1),tab(0))
      close(pipe)
      }

   return \environment[s]
end

Benchmarking Icon Expressions

In the last Newsletter, we talked a little about efficient programming in Icon. There are several problems in writing efficient programs in Icon. One problem is that Icon often provides many different ways of accomplishing a task, and it's often difficult to tell which is the most efficient. Another problem is that many of Icon's features do not have any direct counterpart in the architecture of the computers on which Icon runs. You can guess how a feature like table-lookup is implemented, but you may guess wrong. Even if you know, you may be mistaken about its speed. In other cases, features may be implemented in ways that you'd not expect. Even if you are an expert on the implementation, you may be surprised by how relatively fast or slow some operations are. Even those of us who did the implementation frequently are wrong about speed.

In theory, if you knew enough about the implementation, you could come up with analytic results -- formulas, bounds, and so forth. In practice, this approach has limited usefulness because of the complexity of the problems and the sensitivity of the operations to the kinds of data on which they operate. An alter-native is an empirical approach: measurement of different kinds of expressions or small program segments. Such measurements can help answer questions, pinpoint potential problems, and suggest the most efficient approach to a particular problem.

Benchmarking expression evaluation isn't difficult. We have developed a few tools for translating expressions of interest into programs that time their evaluation in loops and report the results in a convenient format.

To see how helpful benchmarking can be, consider the subscripting of a list: a[i]. This expression is simple enough, and in a more conventional programming language you probably would have a good idea of how it's implemented. But lcon supports stack and queue access as well as positional access, so you might guess that positional access is not as simple as it appears.

In fact, how fast positional access is depends on how the list is constructed. This is a case where benchmarking is useful. Consider two 1,000-element lists, a1 and a2, constructed as follows:

a1 := list(l000)
a2 := []
every 1 to 1000 do put(a2,&null)

Benchmarking shows the following comparative times for referencing the first and last elements of the two lists:

a1[1]       0.1084
a1[1000]    0.1084
a2[1]       0.1084
a2[1000]    0.2914

The time for referencing the first and last elements of a1 is the same, as might be expected. But why should it take longer to access the last element of a2? Both lists have the same size and the same values, but because of the way lists are implemented in lcon, they do not have the same structure. The first consists of a single block of 1,000 elements, while the second consists of a doubly-linked list of smaller blocks. The extra time to access the last element of a2 is a consequence of chaining through the blocks to get to the last one. (See the Icon implementation book for details.)

Question: How long do you think a1[1001] and a2[1001] take? Suppose you build a list by adding elements in the fashion above but later need to access the elements by position? Is there anything you can do to eliminate the referencing overhead that results from the piecemeal construction of the list?

Primes

Andrew Appel's "what does it do?" submission in the last newsletter provoked this response from Bill Griswold:

Here is a sequence of modifications of the prime program in the last Newsletter. The first program is the original submission. After these are modifications that attempt to restrict the range of iteration for checking if the current number is composite. Although the latter programs look more cumbersome (a lot of co-expression creation and invocation), they are actually competitive in speed for large numbers of primes. This is probably due to the fact that the sieve is more discriminating than the simpler programs. (You can also contrast these programs with the sieve in sample programs distributed with UNIX Icon systems, which isn't lazy. It is much faster than any of these.)

procedure main()
   every write((i := 2) | (|i := i + 1) &
     (not(i = (2 to i) * (2 to i))) & i)
end

procedure main()
   every write(i := seq(2) &
      (not(i = (2 to i) * (2 to i))) & i)
end

procedure main()
   every write(i := seq(2) &
      (not(i = (l := (2 to i^0.5)) *
         (l to i))) & i)
end

procedure main()
   every write(i := seq(2) &
      (not (i = (k := 2 to i/(2))*
         (i/(k)))) & i)
end
This program produces the primes lazily using the basic sieve technique and co-expressions. Nested creation of co-expressions eats space and causes (co-expression?) stack overflow. This is a modified version of a program by Robert Henry, and is a translation of a Miranda program.

procedure main(arglist)
   every write(sieve(create seq(2)))
end

procedure modcheck(s, p)
   repeat if ((x := @s) % p) ~= 0
   then suspend x
end

procedure sieve(e)
   suspend (p := @e)
   every suspend sieve(create modcheck(e, p)) end

procedure main(arglist)
   every write(sieve(create seq(2)))
end
Here's a modified version of the sieve program. Elimination of tail recursion in the sieve lets the program run longer than the more straightforward version of this program.

procedure main()
   every write(sieve(create seq(2)))
end

# Check that every value in s is relatively
# prime to p

procedure modcheck(s, p)
   while x := @s do if (x % p) ~= 0
   then suspend x
end

# Sieve of Eratothsenes, sans tail recursion

procedure sieve(e)
   repeat {
      suspend p := @e
      e := create modcheck(e, p)
      }
end

And here's the one from the distributed sample programs:

# This program illustrates the use of sets in
# implementing the classical sieve algorithm
# for computing prime numbers.

procedure main(arglist)
   local limit, s, i

   limit := arglist[1] | 100
   s := set([])
   every member(s,i := 2 to limit) do
      every delete(s,i + i to limit by i)
   primes := sort(s)
   write("There are ",*primes,
     " primes in the first ",limit,"
        integers.")
   write("The primes are:")
   every write(right(!primes,*limit + 1))
end

Queens Never Die

The non-attacking n-queens problem continues to fascinate persons who are interested in program structure. Here's a recent contribution by Paul Abrahams, based on an earlier program by Steve Wampler:

global n, solution

procedure main(args)
   local i

   n := args[1] | 8      # 8 queens by default
   if not(0 < integer(n)) then stop("usage [ n ]")
   solution := list(n)   # a list of column solutions
   write(n,"-Queens:")
   every show(q(1))      # show the result
end

# q(c) -- place queens in columns c through n in all
# possible ways. Suspend with a list of row positions
# for columns c through n

procedure q(c)
   local r
   static up, down, rows

   initial {
      up := list(2*n-1,0)
      down := list(2*n-1,0)
      rows := list(n,0)
      }
   every (r := 1 to n,if 0 = rows[r] = up[n+(r-c)] =
      down[r+c-1] then rows[r] <- up[n+(r-c)] <-
      down[r+c-1] <- 1) do suspend {
         if c = n then [r] else [r] ||| q(c + 1)
      }
end
# Show the solution on a chess board. The argument is
# a list of row positions for columns 1 through n.

procedure show(solution)
   static count, line, border
   initial {
      count := 0
      line := repl("| ",n) || "|"
      border := repl("----",n) || "-"
      }

   write("solution: ", count+:=1)
   write(" ", border)
   every line[4*(!solution -- 1) + 3] <- "Q" do {
      write(" ", line)
      write(" ", border)
      }
   write()
end

Icon home page