link codeobj
March 25, 2002; Ralph E. Griswold
Requires: co-expressions
This file is in the public domain.
These procedures provide a way of storing Icon values as strings and retrieving them. The procedure encode(x) converts x to a string s that can be converted back to x by decode(s). These procedures handle all kinds of values, including structures of arbitrary complexity and even loops. For "scalar" types -- null, integer, real, cset, and string -- decode(encode(x)) === x For structures types -- list, set, table, and record types -- decode(encode(x)) is, for course, not identical to x, but it has the same "shape" and its elements bear the same relation to the original as if they were encoded and decode individually. No much can be done with files, functions and procedures, and co-expressions except to preserve type and identification. The encoding of strings and csets handles all characters in a way that it is safe to write the encoding to a file and read it back. No particular effort was made to use an encoding of value that minimizes the length of the resulting string. Note, however, that as of Version 7 of Icon, there are no limits on the length of strings that can be written out or read in. ____________________________________________________________ The encoding of a value consists of four parts: a tag, a length, a type code, and a string of the specified length that encodes the value itself. The tag is omitted for scalar values that are self-defining. For other values, the tag serves as a unique identification. If such a value appears more than once, only its tag appears after the first encoding. There is, therefore, a type code that distinguishes a label for a previously encoded value from other encodings. Tags are strings of lowercase letters. Since the tag is followed by a digit that starts the length, the two can be distinguished. The length is simply the length of the encoded value that follows. The type codes consist of single letters taken from the first character of the type name, with lower- and uppercase used to avoid ambiguities. Where a structure contains several elements, the encodings of the elements are concatenated. Note that the form of the encoding contains the information needed to separate consecutive elements. Here are some examples of values and their encodings: x encode(x) ------------------------------------------------------- 1 "1i1" 2.0 "3r2.0" &null "0n" "\377" "4s\\377" '\376\377' "8c\\376\\377" procedure main "a4pmain" co-expression #1 (0) "b0C" [] "c0L" set() "d0S" table("a") "e3T1sa" L1 := ["hi","there"] "f11L2shi5sthere" A loop is illustrated by L2 := [] put(L2,L2) for which x encode(x) ------------------------------------------------------- L2 "g3L1lg" Of course, you don't have to know all this to use encode and decode.