Programming Corner from Icon Newsletter 29


February 14, 1989; Icon Version 7

Table Keys

If T is a table, !T generates the values in the elements of T, not the keys (entry values). This probably was a design mistake, since it's possible to get the values from the keys, but not the other way around. So ...

Problem: Write a procedure key(T) that generates the keys from table T.

Comment: The only way to do this is to convert T to a list.

Approach 1: Create a list of key/value lists:
procedure key(T)
   L := sort(T)
   every pair := !L do
      suspend pair[1]
end
Whenever a suspend appears in the do clause of an every expression, look to see if both are needed, since suspend also forces its argument to generate all its results. Thus,
procedure key(T)
   L := sort(T)
   suspend (!L)[1]
end
Note the parentheses: !L[1] groups as !(L[1]), not what you want.

One more refinement: get rid of the identifier L and just put the call of sort in the suspend expression:
procedure key(T)
   suspend (!sort(T))[1]
end
Approach 2: Create a list of alternate key/value pairs:
procedure key(T)
   L := sort(T,3)
   while x := get(L) do {      # get key
      suspend x
      get(L)                   # discard value
      }
end
Observation: The while/suspend construction suggests a more compact technique as in every/suspend. Use the "make a generator from a non-generator" paradigm:
procedure key(T)
   L := sort(T,3)
   every x := |get(L) do {	# get key
      suspend x
      get(L)		# discard value
      }
end
Now the every/suspend can be collapsed as usual:
procedure key(T)
   L := sort(T,3)
   suspend |get(L) do
      get(L)
end
Note the use of the optional do clause for suspend.

This could also be written more compactly (but opaquely) as
procedure key(T)
   L := sort(T,3)
   suspend |1(get(L),get(L))
end
However, it's not possible to get rid of the identifier L, since the list must be an argument of two separate function calls.

Which approach is best? At the bottom line, both are obscure and make hard reading. The first approach produces the most compact code and with no auxillary identifier. However, the list of lists produced by the default option for sorting takes up a lot of room, which may be a practical consideration.

Note also that both of these methods do something more than generate the key -- they generate them in sorted order, which is different from !T.

Unique Values in a List

Here's another problem: Write a procedure uniquelem(L) that returns a list containing only the distinct values of L -- that is, filtering out duplicates.

Approach 1: If order doesn't matter, there's a very simple way:
procedure uniqelem(L)
   return sort(set(L))
end
This takes advantage of the fact that distinct values can be members of a set only once.

This procedure, in fact, produces the elements in sorted order (since the only way to convert a set to a list is to sort it).

Although simple, this method requires both constructing a set that is discarded and also sorting it (perhaps unnecessarily).

Approach 2: If you want to preserve the order of the elements of the list, you have to do something like this:
procedure uniqelem(L)
   local L1, x
   L1 := [ ]
   every x := !L do
      if x === !L1 then next else put(L1,x)
   return L1
end
Of course, this can be very slow if L is large and has mostly distinct elements.

Note that it won't do to rephrase the test and addition of an element as
if x ~=== !L1 then put(L1,x)
since the test succeeds if x is not the same as (say) the first element of L1 but is the same as (say) the second. Be careful of inverting logic in expressions that contain generators.

Also, never do something like
while x := get(L) do ...
since this destroys the list L, a pointer to which is passed in as an argument and "belongs" to someone else, who may not appreciate having the list trashed.

Can you think of a better method?

Data Backtracking

We were recently asked the following question: Why are
while move(1) do { ... }
and
every |move(1) { ... }
different?

The difference has to do with data backtracking. move(1) increments &pos, but it suspends (like find(s)). It does this not so that it can produce another result if it's resumed (there is only one way to increment &pos by 1), but instead to restore &pos to its previous position (data backtracking).

On other hand, the control clause of while-do is bounded. In a bounded expression, suspended generators are discarded, once a result is produced. Thus, in
while move(1) do { ... }
the change to &pos by move(1) is irreversible, since move(1) won't be resumed.

In every-do, on the other hand, the control clause is not bounded -- in fact, the control clause must be able to generate a sequence of results for every-do to have the effect it does. Thus, in
every |move(1) do { ... }
move(1) is resumed after the do clause is evaluated. It restores &pos, and then repeated alternation causes move(1) to be evaluated again, with the same value of &pos as before -- an endless loop.

The problem in understanding this probably is not in what while-do does -- most folks find that to be what they expect. It's in every-do that the problem with understanding arises. A simple prescription that will do for most cases is "don't use every-do in string scanning". That may keep you out of trouble, but a deeper understanding of expression evaluation in Icon will show you both why you usually don't want to use every-do in string scanning and also why you sometimes might want to use it.

Quiz: Assuming s := "Hello world!", what does the following expression write?
every write(s ? move(1 to 10))

Icon home page