Programming Corner from Icon Newsletter 24

Programming Corner from Newsletter 24

June 13, 1987; Icon Version 6
Processing Command-Line Options

In the last Newsletter, there was a discussion of using command-line arguments to Icon programs to provide options that could be used to control program execution. For example, the program for dealing bridge hands might have options for specifying the number of hands to deal as well as the seed for random-number generation. Using the UNIX style for specifying options, the program might be called as follows to produce 10 hands with an initial seed of 17:

deal -h 10 -s 17

Of course, there is nothing sacred about the UNIX style for communicating options, although we've found it handy in the Icon program library to have a relatively uniform method.

In the present Icon program library, every program that has command-line options handles them in its own way. Bob Alexander has provided a procedure that avoids this duplicate code and provides more uniformity to the handling of options. It is patterned after the UNIX getopt facility and is called as getopt(arg,optstr), where arg is the argument list as passed to the main procedure and optstr is a string of allowable option letters.

If a letter is followed by ":", the corresponding option is assumed to be followed by a string of data, optionally separated from the letter by space. If instead of ":" the letter is followed by a "+", the parameter is converted to an integer or if a ".", to a real. If optstr is omitted, any letter is assumed to be valid and to require no data.

The procedure getopt returns a list consisting of two items. The first is a table of the options specified, where the entry values are the specified option letters. The assigned values are the data following the options, if any, or 1 if the option has no data. The table's default value is null. The second item in the returned list is a list of remaining parameters on the command line (usually file names). A "-" that is not followed by a letter is taken as a file name rather than an option.

If an error is detected, stop() is called with an appropriate error message. After calling getopt() the original argument list, arg, is empty.

Not every program that has command-line options needs all the generality that this procedure provides. For example, the program for dealing bridge hands in the last Newsletter needs only the table in the first element in the list returned by getopt:

link getopt
    . . .
procedure main(args)
   local s, hands, opts
      . . .
   hands := 5
   opts := getopt(args,"h+s+")[1]
   hands := \opts["h"]
   &random := \opts["s"]
      . . .

The option string indicates that the allowable option names are h and s and that both require integer arguments. The default values of hands and &random are changed only if the corresponding values are nonnull (that is, they are specified on the command line).

Here's the code for the procedure itself:

procedure getopt(arg,optstr)
   local x, i, c, otab, flist, o, p

   /optstr := string(&lcase ++ &ucase)
   otab := table()
   flist := []
   while x := get(arg) do
      x ? if ="-" & not pos(0) then
         while c := move(1) do
            if i := find(c,optstr) + 1 then otab[c] := 
              if any(':+.',o := optstr[i]) then {
                 p := "" ~== tab(0) | get(arg) |
                    stop(...)
              case o of {
                 ":":
                 "+": integer(p) | stop(...)
                 ".": real(p) | stop(...)
                 }
              }
           else 1
        else stop(...) else put(flist,x)
   return [otab,flist]
end.

In the actual procedure, the calls to stop contain appropriate error messages. The text was elided here to make the program fit into the space available. With apologies to Bob, we also shortened an identifier here and there and rearranged things somewhat toward the same cause.

Efficient Programming in Icon

Since many of Icon's features have no direct correspondence in the underlying architecture of the computers on which Icon is implemented, there often is no way of knowing, from the language itself, whether a particular operation is efficient or inefficient. For example, we have been asked several times whether computing the size of a string (*s) is fast or slow.

The ultimate source of such information is in the implementation itself. Persons who are interested in implementation techniques may want to study the implementation book or even the source code. Of course, an Icon programmer should not have to be an expert in the implementation of the language in order to know how to program efficiently. While a general treatment of this subject is complicated and extensive, a few specific suggestions go a long way toward providing the information needed most frequently.

This section of the Programming Corner is devoted to such issues. More material on efficient coding techniques will appear on a more or less regular basis in future Newsletters.

The size of an object: Let's start with the answer to the question above: "Is determining the size of a string fast or slow?". The answer to this one is simple: It is fast. The reason is that the size of a string is stored as part of the string value and is immediately accessible. (This is in contrast to C, where it is necessary to count the characters every time the size of a string is needed.)

In fact, the sizes of all objects are stored with them and are quickly available. For objects that may change in size, such as lists, sets, and tables, the size is updated whenever it is changed. So *x is fast, regardless of the type of x.

Don't get carried away with this, however:

*s = 0

is slower than

s == ""

simply because the former requires two operations.

Augmented assignment: Icon provides augmented assignment operations such as

i +:= 1

as shorthand for

i := i + 1

In fact, there are augmented assignment versions of all binary operations except assignments themselves, although many are rarely used.

Augmented assignments are not just abbreviations; they are more efficient that the non-augmented forms, since the variable to which the assignment is made is only referenced once. With just a simple variable like an identifier, the difference is minor. With a computed variable, such as a table reference, the difference may be quite significant.

For example, if t is a table, it is better to write

t[x] +:= n

than to write

t[x] := t[x] + n

While the amount of time it takes to look something up in a table is hardly obvious on the face of it (but will be discussed in a subsequent Newsletter), it does not take a lot of imagination to what's in it -- and if the table is huge, the time to find something in it may be considerable.

Operations on lists: Since lists can be accessed both by position and as deques (stacks and queues in combination), it is worth thinking about alternative possibilities when performing list operations. Consider, for example, the problem of circularly rotating the elements in a list left by one. The typical code for this for a Pascal-like language, cast in Icon is:

first := a[1]
every i := 1 to *a -- 1 do
    a[i] := a[i + 1]
a[*a] := first

Looping though the list like this is certainly straightforward, if a bit tedious. If you want to get more idiomatic, it can be rewritten more concisely. But what about thinking of the list as a deque and writing

put(a,get(a))

That is, take an element off the left end of the list and put it on the right end.

Certainly the second approach is more concise, but what about efficiency? To begin with, it seems plausible that the looping approach takes an amount of time that is approximately proportional to the size of the list. (This is true if the list is constructed all at once. The situation is more complicated if the list is built by push() or put(). See the Icon implementation book.) But what about taking an element off one end of a list and putting it on the other end? Does Icon go through the looping approach internally? Or does it do something cleverer and more efficient? (This question might lead another: How would you go about implementing lists with both positional and deque access?)

The answers to some of these questions are given in some detail in the Icon implementation book. Suffice it to say here that Icon uses a fairly sophisticated method for implementing lists that balances the costs of the different kinds of access. Both put and get are fast. In fact, get(a) is slightly faster than a[i]. Furthermore, the rotation above, using put and get, is essentially independent of the size of the list.

To get an idea of the difference between the looping and deque methods, here are comparative timings for rotating a list created by list(n), where n has the values 100 and 1000:

n       100      1000

loop  35.76    356.81
deque  0.16      0.16

The moral should be clear: When you're programming in Icon, think in Icon.

Pattern Words

We recently received a list of available publications from Aegean Park Press, which publishes a wide variety of material related to cryptography. Among their current offerings are two books of "pattern words" The pattern word for a word is obtained by replacing all instances of the first letter of the word by A, all instances of the second letter by B, and so on. For example, proposals has the pattern word ABCACDEFD. (Can you find another word with the same pattern word?)

So, here's a small problem: Write an Icon procedure patword(s) that returns the pattern word corresponding to the value of s. For simplicity, you can assume that s may contain characters other than letters, that upper- and lower-case letters are distinct, and that there are (say) no more than 26 different characters in s. Think about different programming techniques, keeping in mind the moral given at the end of the preceding section. Strive for a solution that is both efficient and elegant. We'll leave the definition of elegant to you; our ideas on this will be included with solutions, which will appear in the next Newsletter.

Using Pipes -- for UNIX Users

One of the most elegant and powerful features of UNIX is the pipe, which allows the output from one program to be fed as input to another program. Some other operating systems "fake" this by writing all the output of the first program to a temporary file, and then using that temporary file as input to the second program. That's not the real thing (suppose the first program produces an enormous amount of output or even does not terminate), but this is not a forum for arguing about operating systems. If you are not familiar with pipes, however, you might want to study this feature.

UNIX implementations of Icon support the reading and writing of pipes. The real power of pipes in Icon lies in being able not only to access UNIX commands (which can be done with the system function) but also to pass data between the program and the commands. The technique is very similar to opening a file for reading or writing, except a command string instead of a file is opened. In the case of reading, output from the pipe becomes input to the Icon program, while in the case of writing, output from the Icon program becomes input to the pipe.

Here's an example:

names := open("ls", "pr")

The first argument is the command, and the p in the second argument indicates the first argument is a command, not the name of a file. The file assigned to names can then be read like any other file; each read produces a line of output from ls. For example, the following loop writes out the names of the files as produced by ls:

while write(read(names))

The loop terminates when the output from ls terminates, exactly as if a file had been read.

The command that is opened as a pipe can be any UNIX command. For example,

path := read(open("pwd","p"))

assigns to path the path to the present working directory. Notice that since opening for reading is the default, the r need not be specified in the option.

Pipes can be opened for writing also, as in

sortout := open("sort >out.sort", "pw")

which causes output written to sortout to be sorted and written to the file out.sort.

Like other files, pipes should be closed when they are no longer needed.

Here's a problem for UNIX gurus: Write an Icon procedure getenv(s) that returns the value of the environment variable s if it is set but fails if it is not. If you manage to distinguish between environment variables that are not set and those that are set to the null value, we'd like to see how you did it.

Puzzles and Such

Andrew Appel at Princeton University sent us this program and asked us if we could guess what it does:

procedure main()
   every write((i := 2 | (|i := i + 1)) &
    (not(i = (2 to i) * (2 to i))) & i)
end

Ken Walker contributes the shortest self-reproducing Icon program that we know of:

procedure main();x:="procedure main();x:= \nx[21]:=image(x);write(x);end" x[21]:=image(x);write(x);end

The program is a single line; we divided it into two lines that are short enough to space.