codeobj.icn: Procedures to encode and decode Icon data

link codeobj
March 25, 2002; Ralph E. Griswold
Requires: co-expressions
This file is in the public domain.

   These procedures provide a way of storing Icon values as strings and
retrieving them.  The procedure encode(x) converts x to a string s that
can be converted back to x by decode(s). These procedures handle all
kinds of values, including structures of arbitrary complexity and even
loops.  For "scalar" types -- null, integer, real, cset, and string --

     decode(encode(x)) === x

   For structures types -- list, set, table, and record types --
decode(encode(x)) is, for course, not identical to x, but it has the
same "shape" and its elements bear the same relation to the original
as if they were encoded and decode individually.

   No much can be done with files, functions and procedures, and
co-expressions except to preserve type and identification.

   The encoding of strings and csets handles all characters in a way
that it is safe to write the encoding to a file and read it back.

   No particular effort was made to use an encoding of value that
minimizes the length of the resulting string. Note, however, that
as of Version 7 of Icon, there are no limits on the length of strings
that can be written out or read in.
____________________________________________________________

   The encoding of a value consists of four parts:  a tag, a length,
a type code, and a string of the specified length that encodes the value
itself.

   The tag is omitted for scalar values that are self-defining.
For other values, the tag serves as a unique identification. If such a
value appears more than once, only its tag appears after the first encoding.
There is, therefore, a type code that distinguishes a label for a previously
encoded value from other encodings. Tags are strings of lowercase
letters. Since the tag is followed by a digit that starts the length, the
two can be distinguished.

   The length is simply the length of the encoded value that follows.

   The type codes consist of single letters taken from the first character
of the type name, with lower- and uppercase used to avoid ambiguities.

   Where a structure contains several elements, the encodings of the
elements are concatenated. Note that the form of the encoding contains
the information needed to separate consecutive elements.

   Here are some examples of values and their encodings:

     x                     encode(x)
-------------------------------------------------------

     1                     "1i1"
     2.0                   "3r2.0"
     &null                 "0n"
     "\377"                "4s\\377"
     '\376\377'            "8c\\376\\377"
     procedure main        "a4pmain"
     co-expression #1 (0)  "b0C"
     []                    "c0L"
     set()                 "d0S"
     table("a")            "e3T1sa"
     L1 := ["hi","there"]  "f11L2shi5sthere"

A loop is illustrated by

     L2 := []
     put(L2,L2)

for which

     x                     encode(x)
-------------------------------------------------------

     L2                    "g3L1lg"

   Of course, you don't have to know all this to use encode and decode.

Source code | Program Library Page | Icon Home Page