Programming Corner from Icon Newsletter 15


June 7, 1984; Version 5

Assignment to Subscripted Strings

In the last Newsletter, the semantics of expressions such as
x[i:j] := (x := expr)
were posed, where x is string-valued when the subscripting expression is evaluated, but in which expr changes the value of x before the (left) assignment is made to replace the subscripted string. (Expressions such as x[i] and x[i+:j] are just special cases of x[i:j].)

While expressions like this are uncommon (and generally are considered to be in poor style), they are legal and therefore must be well-defined and handled properly in the implementation. (It should be no surprise that all the possibilities were not considered in the initial design and that there were several bugs related to these matters in the early versions of the implementation.)

This is a case where efficiency and implementation considerations influenced language design. The problem is that the transator cannot, in general, determine whether an expression such as x[i:j] will have a value assigned to it. Even if x[i:j] is the target of an assignment operation, the assignment never may be made because of failure in evaluation elsewhere in the expression. In the case of
return x[i:j]
the translator has even less information, since the use of the returned expression depends on the context in which the function containing this return is called. For these reasons, the translator treats all expressions usch as x[i:j] in the same way. (There is the potential here for an implementation optimization, since there are many situations in which the translator could determine that a subscripting expression is not the target of an assignment.) When an expression like x[i:j] is evaluated, if the value of x[i:j] is a string, a trapped variable is produced. A trapped variable is a special kind of variable that points to a small block of data which contains enough information to assign a new value to x if an assignment is made to x[i:j]. This information consists of the variable x and the location of the substring in x. Every string subscripting expression produces a trapped variable. Although the block of data that is created usually is used only transiently, it causes a certain amount of storage throughput.

Now consider what happens if the value of x is changed before an assignment is made to x[i:j]. Since the value of x can be changed to anything, the assignment cannot be made blindly -- the position of the replaced string might be out of range, even if the new value of x is a string. Consequently, the type of x is checked when assignment is about to be made to x[i:j]. If the value of x is a string, its length is checked to be sure the sub-string specified by i:j is still in range. If it is, the assignment is made, even if the value of x is different from what it was when x[i:j] was evaluated. Thus, in
x := "hello world"
x[3] := (x := "abc")
the value of x becomes "ababc". On the other hand, if the value of x is a string, but it is too short, run-time error 205 (value out of range) occurs, as in
x := "hello world"
x[3] := (x := "ab")
One might well argue that assignment to x[i:j] should be an error if the value of x has changed, even if the substring is still in range. After all, such a situation seems more likely to be an error than an intentional computation. Here, however, there is an efficiency consideration. In order to be able to detect that the value of x has changed, it would be necessary to save the value of x as well as the variable x in the trapped variable.

Furthermore, this would have to be done for every evaluation of a string subscripting expression. The result would be substantially higher storage throughput just to treat a pathological case more elegantly.

From a language design viewpoint, a somewhat more radical alternative would be to bind the value of x to x at the time x[i:j] is evaluated, so that
x := "hello world"
x[3] := (x := "abc")
would change the value of x to "heabclo world". This solution also would require saving the value of x in the trapped variable -- additional overhead that again does not seem justified for such a pathological situation.

Returning to the situation as it actually is handled, given that any string value for x that is long enough is acceptable, the next question is what to do if the value of x is not a string when the assignment is made to x[i:j]? In consonance with Icon's general philosophy of converting types automatically whenever possible, if the value of x can be converted to a string, it is. Thus,
x := "hello world"
x[3] := (x := 397)
changes the value of x to "39397". Weird, maybe, but consistent with the result of concatenating two integers -- which is, after all, what this expression amounts to.

If the value of x cannot be converted to a string, a run-time error (103) occurs, as in
x := "hello world"
x[3] := (x := [1,2,3,4])
Note that these problems are essentially problems of dereferencing -- when and how the value of x is determined when assignment is made to x[i:j]. There are a number of other situations in Icon in which dereferencing is a problem. One is string scanning, which will be discussed in the next Newsletter, along with more material on matching expressions.

Trivia Corner

In the last Newsletter, the following problem was posed:
What is the longest string of distinct prefix operators which, when applied to a value, might compute a meaningful result? (You may assume any value that you wish.) What if the prefix operators need not be distinct?

For distinct prefix operators, one possibility is
|+=-?*~\@^!x
It might go like this: Let x be a list of co-expressions. Generate one, refresh and activate it, being sure the result is non-null. Assuming the result is a cset, use the size of its complement to provide a range for a randomly selected integer. Negate this integer. Match the equivalent string in &subject and convert the result back to an integer. Repeat the whole process (whatever that means). Enough!

Strictly speaking, repeated alternation is a control structure, not an operator, but it is denoted with operator syntax. Note that the prefix operators . and / are not included in the expression above. They can be added, but not in a "meaningful" way.

If the prefix operators do not have to be distinct, there is no specific limit on the number that can occur. Consider, for example,
=== ... ==x
The expression =x matches x in &subject, ==x matches two consecutive occurrences of x, and so on.

What about expressions such as
??? ... ??x

Pitfalls


Steve Wampler contributes the following program, in which the procedure tally echos its argument and tallies it in the table count. In the main procedure, empty input lines are converted into the more prominent marker <empty line> . Or are they? What does this program actually do? What does it take to fix the problem?
global count
procedure main()
   count := table(0)
   while line := read() do
      tally(("" ~== line) | "<empty line>")
     . . .
end
procedure tally(s)
   count[s] +:= 1
   write(s)
end


Icon home page