
Programming Corner from Newsletter 24
June 13, 1987; Icon Version 6
Processing Command-Line Options
In the last Newsletter, there was a discussion of using command-line arguments
to Icon programs to provide options that could be used to control program
execution. For example, the program for dealing bridge hands might have
options for specifying the number of hands to deal as well as the seed for
random-number generation. Using the UNIX style for specifying options, the
program might be called as follows to produce 10 hands with an initial seed
of 17:
deal -h 10 -s 17
Of course, there is nothing sacred about the UNIX style for communicating
options, although we've found it handy in the Icon program library to have
a relatively uniform method.
In the present Icon program library, every program that has command-line
options handles them in its own way. Bob Alexander has provided a procedure
that avoids this duplicate code and provides more uniformity to the handling
of options. It is patterned after the UNIX getopt facility and is
called as getopt(arg,optstr)
, where arg
is the
argument list as passed to the main procedure and optstr
is
a string of allowable option letters.
If a letter is followed by ":"
, the corresponding
option is assumed to be followed by a string of data, optionally separated
from the letter by space. If instead of ":" the letter is followed
by a "+"
, the parameter is converted to an integer
or if a "."
, to a real. If optstr
is
omitted, any letter is assumed to be valid and to require no data.
The procedure getopt
returns a list consisting of two items.
The first is a table of the options specified, where the entry values are
the specified option letters. The assigned values are the data following
the options, if any, or 1 if the option has no data. The table's default
value is null. The second item in the returned list is a list of remaining
parameters on the command line (usually file names). A "-"
that is not followed by a letter is taken as a file name rather than an
option.
If an error is detected, stop()
is called with an appropriate
error message. After calling getopt()
the original argument
list, arg
, is empty.
Not every program that has command-line options needs all the generality
that this procedure provides. For example, the program for dealing bridge
hands in the last Newsletter needs only the table in the first element in
the list returned by getopt:
link getopt
. . .
procedure main(args)
local s, hands, opts
. . .
hands := 5
opts := getopt(args,"h+s+")[1]
hands := \opts["h"]
&random := \opts["s"]
. . .
The option string indicates that the allowable option names are h
and s
and that both require integer arguments. The default
values of hands and &random
are changed only if the corresponding
values are nonnull (that is, they are specified on the command line).
Here's the code for the procedure itself:
procedure getopt(arg,optstr)
local x, i, c, otab, flist, o, p
/optstr := string(&lcase ++ &ucase)
otab := table()
flist := []
while x := get(arg) do
x ? if ="-" & not pos(0) then
while c := move(1) do
if i := find(c,optstr) + 1 then otab[c] :=
if any(':+.',o := optstr[i]) then {
p := "" ~== tab(0) | get(arg) |
stop(...)
case o of {
":":
"+": integer(p) | stop(...)
".": real(p) | stop(...)
}
}
else 1
else stop(...) else put(flist,x)
return [otab,flist]
end.
In the actual procedure, the calls to stop contain appropriate error messages.
The text was elided here to make the program fit into the space available.
With apologies to Bob, we also shortened an identifier here and there and
rearranged things somewhat toward the same cause.
Efficient Programming in Icon
Since many of Icon's features have no direct correspondence in the underlying
architecture of the computers on which Icon is implemented, there often
is no way of knowing, from the language itself, whether a particular operation
is efficient or inefficient. For example, we have been asked several times
whether computing the size of a string (*s
) is fast or slow.
The ultimate source of such information is in the implementation itself.
Persons who are interested in implementation techniques may want to study
the implementation book or even the source code. Of course, an Icon programmer
should not have to be an expert in the implementation of the language in
order to know how to program efficiently. While a general treatment of this
subject is complicated and extensive, a few specific suggestions go a long
way toward providing the information needed most frequently.
This section of the Programming Corner is devoted to such issues.
More material on efficient coding techniques will appear on a more or less
regular basis in future Newsletters.
The size of an object: Let's start with the answer to the question
above: "Is determining the size of a string fast or slow?". The
answer to this one is simple: It is fast. The reason is that the size of
a string is stored as part of the string value and is immediately accessible.
(This is in contrast to C, where it is necessary to count the characters
every time the size of a string is needed.)
In fact, the sizes of all objects are stored with them and are quickly available.
For objects that may change in size, such as lists, sets, and tables, the
size is updated whenever it is changed. So *x
is fast, regardless
of the type of x
.
Don't get carried away with this, however:
*s = 0
is slower than
s == ""
simply because the former requires two operations.
Augmented assignment: Icon provides augmented assignment operations
such as
i +:= 1
as shorthand for
i := i + 1
In fact, there are augmented assignment versions of all binary operations
except assignments themselves, although many are rarely used.
Augmented assignments are not just abbreviations; they are more efficient
that the non-augmented forms, since the variable to which the assignment
is made is only referenced once. With just a simple variable like an identifier,
the difference is minor. With a computed variable, such as a table reference,
the difference may be quite significant.
For example, if t
is a table, it is better to write
t[x] +:= n
than to write
t[x] := t[x] + n
While the amount of time it takes to look something up in a table is hardly
obvious on the face of it (but will be discussed in a subsequent Newsletter),
it does not take a lot of imagination to what's in it -- and if the table
is huge, the time to find something in it may be considerable.
Operations on lists: Since lists can be accessed both by position
and as deques (stacks and queues in combination), it is worth thinking about
alternative possibilities when performing list operations. Consider, for
example, the problem of circularly rotating the elements in a list left
by one. The typical code for this for a Pascal-like language, cast in Icon
is:
first := a[1]
every i := 1 to *a -- 1 do
a[i] := a[i + 1]
a[*a] := first
Looping though the list like this is certainly straightforward, if a bit
tedious. If you want to get more idiomatic, it can be rewritten more concisely.
But what about thinking of the list as a deque and writing
put(a,get(a))
That is, take an element off the left end of the list and put it on the
right end.
Certainly the second approach is more concise, but what about efficiency?
To begin with, it seems plausible that the looping approach takes an amount
of time that is approximately proportional to the size of the list. (This
is true if the list is constructed all at once. The situation is more complicated
if the list is built by push()
or put()
. See the
Icon implementation book.) But what about taking an element off one end
of a list and putting it on the other end? Does Icon go through the looping
approach internally? Or does it do something cleverer and more efficient?
(This question might lead another: How would you go about implementing lists
with both positional and deque access?)
The answers to some of these questions are given in some detail in the Icon
implementation book. Suffice it to say here that Icon uses a fairly sophisticated
method for implementing lists that balances the costs of the different kinds
of access. Both put
and get
are fast. In fact,
get(a)
is slightly faster than a[i]
. Furthermore,
the rotation above, using put
and get
, is essentially
independent of the size of the list.
To get an idea of the difference between the looping and deque methods,
here are comparative timings for rotating a list created by list(n)
,
where n
has the values 100 and 1000:
n 100 1000
loop 35.76 356.81
deque 0.16 0.16
The moral should be clear: When you're programming in Icon, think in Icon.
Pattern Words
We recently received a list of available publications from Aegean Park Press,
which publishes a wide variety of material related to cryptography. Among
their current offerings are two books of "pattern words" The pattern
word for a word is obtained by replacing all instances of the first letter
of the word by A
, all instances of the second letter by B
,
and so on. For example, proposals
has the pattern word ABCACDEFD
.
(Can you find another word with the same pattern word?)
So, here's a small problem: Write an Icon procedure patword(s)
that returns the pattern word corresponding to the value of s
.
For simplicity, you can assume that s
may contain characters
other than letters, that upper- and lower-case letters are distinct, and
that there are (say) no more than 26 different characters in s
.
Think about different programming techniques, keeping in mind the moral
given at the end of the preceding section. Strive for a solution that is
both efficient and elegant. We'll leave the definition of elegant to you;
our ideas on this will be included with solutions, which will appear in
the next Newsletter.
Using Pipes -- for UNIX Users
One of the most elegant and powerful features of UNIX is the pipe, which
allows the output from one program to be fed as input to another program.
Some other operating systems "fake" this by writing all the output
of the first program to a temporary file, and then using that temporary
file as input to the second program. That's not the real thing (suppose
the first program produces an enormous amount of output or even does not
terminate), but this is not a forum for arguing about operating systems.
If you are not familiar with pipes, however, you might want to study this
feature.
UNIX implementations of Icon support the reading and writing of pipes. The
real power of pipes in Icon lies in being able not only to access UNIX commands
(which can be done with the system function) but also to pass data between
the program and the commands. The technique is very similar to opening a
file for reading or writing, except a command string instead of a file is
opened. In the case of reading, output from the pipe becomes input to the
Icon program, while in the case of writing, output from the Icon program
becomes input to the pipe.
Here's an example:
names := open("ls", "pr")
The first argument is the command, and the p
in the second
argument indicates the first argument is a command, not the name of a file.
The file assigned to names can then be read like any other file; each read
produces a line of output from ls
. For example, the following
loop writes out the names of the files as produced by ls
:
while write(read(names))
The loop terminates when the output from ls
terminates, exactly
as if a file had been read.
The command that is opened as a pipe can be any UNIX command. For example,
path := read(open("pwd","p"))
assigns to path
the path to the present working directory.
Notice that since opening for reading is the default, the r
need not be specified in the option.
Pipes can be opened for writing also, as in
sortout := open("sort >out.sort", "pw")
which causes output written to sortout to be sorted and written to the file
out.sort
.
Like other files, pipes should be closed when they are no longer needed.
Here's a problem for UNIX gurus: Write an Icon procedure getenv(s)
that returns the value of the environment variable s
if it
is set but fails if it is not. If you manage to distinguish between environment
variables that are not set and those that are set to the null value, we'd
like to see how you did it.
Puzzles and Such
Andrew Appel at Princeton University sent us this program and asked us if
we could guess what it does:
procedure main()
every write((i := 2 | (|i := i + 1)) &
(not(i = (2 to i) * (2 to i))) & i)
end
Ken Walker contributes the shortest self-reproducing Icon program that we
know of:
procedure main();x:="procedure main();x:= \nx[21]:=image(x);write(x);end" x[21]:=image(x);write(x);end
The program is a single line; we divided it into two lines that are short
enough to space.
Icon home page