Programming Corner from Icon Newsletter 40
December 21, 1992; Icon Version 8
Sets, in the generality found in Icon, are rare in programming languages,
and persons who are used to programming in a language like Pascal or C tend
to overlook the usefulness of sets in Icon.
The idea of a set is very simple: A set is simply a collection of distinct
values. The word distinct is significant; a value can appear only once in
a set, as opposed to a list, which can contain many instances of the same
value. And, unlike lists, sets are unordered. Consequently, sets embody
the concept of membership and are useful for holding values that share a
common property, such as a set of all the (distinct) words in a file.
The nice thing about sets is that they are easy to use. A set starts out
empty, with no values. Values can be inserted into a set or deleted from
a set without having to worry about the space they occupy; sets grow and
shrink automatically.
To see how easy it is to use sets, suppose you have a procedure genword()
that generates successive words from the input file. If you're willing to
accept the definition of a word as a string of consecutive letters, something
as simple as the following will do:
procedure genword()
while line := read() do
line ? {
while tab(upto(&letters)) do
suspend(tab(many(&letters))) \ 1
}
end
Most text files contain multiple instances of the same words. To find all
the distinct words, it's necessary to keep track of the words that have
occurred. The distinct nature of values in sets takes care of this automatically.
The way a value x
is added to a set S
is by
insert(S, x)
If x
is not in S
, it is added to S
.
If x
already is in S
, it is not added. It's that
simple.
So to create a set of the distinct words produced by genword()
,
all that's needed is
words := set() # start empty
every insert(words, genword())
The every
control structure keeps resuming genword(),
causing it to produce all the words in the input file. Each new word is
added to words
, while words that have occurred before are ignored.
The end result is that words
contains all the distinct words
in the input file.
The resulting set could be used in many ways. For example, to write a list
of all the distinct words, all that's necessary is
every write(!words)
The element-generation operator !
generates the values of words
and every causes all of them to be produced.
That's fine if you don't care about the order in which the words are listed.
Since a set is (conceptually) unordered, the words may come out in any order.
In theory, the last word generated by genword()
might be the
first one to be listed. Of course, there's some order to the set inside
the computer; it's just that it's not one you can predict or that is useful.
It's simple enough, however, to list the words in alphabetical order. Sorting
a set produces a list with the set values in order. In the case of strings,
this is alphabetical order.
The element-generation operator works on lists as well as sets, but for
lists it starts at the beginning and progresses to the end. Consequently,
every write(!sort(words))
lists the words in alphabetical order.
We've taken a bit of a shortcut here. The function sort()
produces
a list and we've applied the element-generation operation to this list without
bothering to save the list. We could have written
wordlist := sort(words)
every write(!wordlist)
In fact, that would have been the thing to do if we needed the sorted list
after listing its contents.
There's another order for listing the distinct words that you might want:
in order of first occurrence. This is just a little more difficult.
What's necessary is to know when a new word is being added to words. The
function insert(S,
x)
doesn't indicate this, but
there's another function, member(S,
x)
, that does.
It succeeds if x
is in S
but fails if it is not.
To get the first-occurrence order of listing, this will do:
every word := genword() do {
if not member(words, word) then {
write(word)
insert(words, word)
}
}
Thus, if a word
is not in words
, it is written
and inserted into words
.
If you need to keep a list of words in order of first occurrence, rather
than just list them as they occur, you can build a list, using Icon's capability
to put values on to the end of a list:
wordlist := [] # start empty
every word := genword() do {
if not member(words, word) then {
put(wordlist, word)
insert(words, word)
}
}
When this is complete, wordlist
contains all the distinct words
in order of first occurrence. A listing then can be produced by
every write(!wordlist)
or wordlist
can be used in any other way that is needed.
There are many other things you can do with sets. Some are simple, like
forming the union and intersection of two sets. For example,
all_words := words1 ++ words2
creates a new set, all_words
, that contains all the words in
words1
and words2
. Similarly,
common_words := words1 ** words2
creates a new set, common_words
, that contains only the words
that are in both words1
and words2
.
In our examples, we've used words and hence all the values in the sets are
strings. Sets are more versatile than this. Their values can be anything
-- strings, integers, real numbers, even structures.
And a set can contain values of different types. This allows sets to be
used for purposes far afield from keeping track of distinct words. But using
them in such ways is more complicated.
Icon home page