From mmdf Sun May 15 13:57:48 1983 From: whm at arizona Date-Sent: 15 May 83 13:55:59 MST (Sun) Subject: Welcome to Icon-Group To: icon-group@arizona Status: RO The Icon-Group mailing list is now open for business. This list is meant to be a common meeting area for persons interested in the Icon programming language. Topics for discussion include (but are not limited to): Programming Techniques - Style - Efficiency - Idioms - Pros and Cons of certain constructs - Clever ways to accomplish a certain task Theoretical Aspects Icon in relation to other languages Applications of Icon Implementation Issues Porting Icon to other machines History/Origins Idea/Wish Lists Announcement/Exchange of programs and procedures Bugs While it is hoped that this list doesn't become devoted to bug-related items, we will use the list for bug reports and fixes. We will do our best to fix any bugs that are reported. Bug fixes that we announce will be incorporated in the next system release. Items of interest should be mailed to icon-group.arizona@rand-relay. Mail will be immediately redistributed to all persons on the list. Requests for addition/removal from the list should be sent to icon-group-request.arizona@rand-relay. Archives will be maintained locally. This mailing list does not subsume the function of the icon-project mailbox that we currently maintain. Rather, the mailing list supplements it. Icon questions that may not be appropriate for icon-group can be sent to icon-project. If you are in doubt, send to icon-project, note your uncertainty, and we'll see it ends up in the correct place. As mentioned, the list is "on", if you have anything to say, speak up. Bill Mitchell From mmdf Sun May 15 18:46:03 1983 From: whm at arizona Date-Sent: 15 May 83 18:44:49 MST (Sun) Subject: Icon documentation To: icon-group@arizona Status: RO We frequently receive requests concerning the availability of Icon documentation. Here is the current situation. The only up-to-date and complete documentation of Version 5 of Icon is contained in the following book (generally referred to as "the Icon book"): The Icon Programming Language, Ralph E. Griswold and Madge T. Griswold. Prentice-Hall, Inc., Englewood Cliffs, NJ. 1983. 313 pages, paperback. $18.95. This book should be available through any seller of new books. It is not available through the University of Arizona. The Icon reference manual (TR 81-4a) which used to be distributed with the system is both out of date and out of print and is no longer available. There is a brief (9-page) overview of Icon available as University of Arizona technical report TR 83-3a. This report will be sent via the Postal Service, free of charge, to anyone who requests it. Nroff source for this overview (~23K bytes) will be sent via electronic mail on individual request, but it will not be broadcast to icon-group because of its size. There are quite a few other technical reports and journal articles related to Icon. These are summarized in a bibliography of Icon documents, which, like the overview, will be sent free of charge, via the Postal Service. Ask for the "Icon Bibliography". All Icon-related technical reports and journal reprints authored at the University of Arizona are available free of charge, while supplies last. The bibliography described above indicates what is available. Requests for documents should be addressed to icon-project.arizona@rand-relay Be sure to include a postal address for hardcopy requests. From mmdf Fri May 20 07:41:34 1983 From: ralph at arizona To: icon-group@arizona Subject: Survey Status: RO In order to get an idea what directions this group might take, it would help to know how familiar its members are with Icon and where their interests lie. Please, therefore, answer the following questions: A. What is your familiarity with Icon? -- on a scale of 1 to 10, graded roughly as follows: 1 -- have heard of Icon, but that's about all . . . 5 -- generally knowledgeable about Icon and competent to formulate simple programs . . . 10 -- Icon wizard B. What discussion topics most interest you (see the mail that announced the establishment of icon-group or make up your own categories). Mail to icon-project.arizona@rand-relay Replies will be summarized and mailed to icon-group. - Ralph Griswold From mmdf Mon May 23 16:21:54 1983 From: Robert E. Wells Subject: Implementation of Icon tables To: icon-group.arizona@rand-relay Cc: rwells@bbn-unix Status: RO I would like to know more about the implementation of Icon tables. Here are a few questions... * In SNOBOL4, the table create function has an optional second argument that specifies the reallocation increment for the table. Icon doesn't seem to have this level of control. Were there reasons for and against having this control in Icon? How does Icon pick the reallocation increment? In SNOBOL4, the table size returned to the user seems to be the amount allocated, rather than the amount used. Icon seems to return the amount used as the size; does it have a hidden allocation size as well? * I assume tables use hashed searching. Is there a particular hashing algorithm that makes it easier to dynamically increase the size of the table? * A substring operation like x[1:4] fails if the given substring doesn't exist, yet words["way"] succeeds (returning the default table value) even if "way" isn't in the table words. This seems inconsistent to me; is it a good inconsistency? The FASBOL dialect of SNOBOL4 had a MEMBER function that acted just like normal table references, but failed if the given index was not in the table; this was particularly useful if real table elements could have the default value, and one needed to quickly determine table membership independent of value. -Thanks, Robert From mmdf Mon May 23 18:17:24 1983 From: whm at arizona Date-Sent: 23 May 83 18:15:56 MST (Mon) Subject: Icon on a 68000 To: icon-group@arizona Status: RO We are seriously considering porting Icon to a 68000 based UNIX- like system. Before we dive in on this, we'd like to know if anyone has knowledge of someone who is already working on a 68000 port. If you have even so much as a strong rumor, let us know. Reply to icon-project if you would, or if the information might be of general interest, send it to icon-group. Bill Mitchell From mmdf Tue May 24 07:39:07 1983 From: ralph at arizona To: icon-group.arizona@rand-relay Subject: Tables Status: RO Robert E. Wells has raised some interesting questions about tables in SNOBOL4 and Icon. I can respond to some of these ques- tions from my own experience and speculate about others. Tables were added to SNOBOL4 very late in its design and imple- mentation. In fact, the implementation was pasted onto the sys- tem as it was about to go out the door. In this first (SIL) implementation of SNOBOL4, values in tables were not hashed, but a linear lookup was used instead. The two arguments in TABLE(N,M) specified sizes for the initial and subsequent chained blocks, respectively (much like primary and secondary space allo- cations for OS/360 data sets). As far as I know, most subsequent implementations of SNOBOL4 used hashing techniques for tables. E.g., in both SITBOL and MACRO SPITBOL, the form of the function is TABLE(N), where N specifies the number of hash bins. Hashing is used to select a bin and col- lisions are handled by chaining within a bin. In SITBOL, the default value of N is 37. Icon uses a technique similar to SITBOL and MACRO SPITBOL, except that the number of bins is fixed (14). There obviously is a tradeoff between the number of bins (and hence the minimum space used for each table) and efficiency of lookup, depending a lot on how many values there are in the table. Hashing seems to me like the reasonable way to implement tables. Note that there is a problem, since a table can be subscripted with any kind of value, not just a string. Subscripting with structures (like tables (!) and records) raises the issue of how to compute a hash number. In fact, Icon had a bug in this area until just recently. Algorithms for hashing values of arbitrary types might be a topic for more discussion. I personally think that user specification of the block size or the number of hash bins is contrary to the philosophy of languages like SNOBOL4 and Icon. Anyway, how many users know how to pick the parameters intelligently? Most users either let the parame- ters default or fiddle around and see what seems to work best. My view is that the implementation should be clever enough to adapt the structure used for the table as necessary to maintain efficient lookup. Needless to say, this is not easy. The issue of s[i] failing if i is out of range of the string s, while t[x] always succeeding if t is a table, is a conceptual issue. Part of the apparent discrepancy comes from the fact that the operation x[y] is polymorphous, using the same syntax for quite different operations. The identical syntax suggests that the semantics of the operations should be more similar than they are (perhaps). My view on this is that s[i], where s is a string, is an operation on a fixed object and it is convenient for it to fail if i is out of the range of s. Almost the identical thing happens for a[i], where a is a list. In t[x], where t is a table, the underlying concept is that the domain of tables poten- tially includes every possible subscript -- or even that a table (conceptually) contains every subscript in the universe of sub- scripts, but that values have not been associated with all sub- scripts. This leads to the idea of a default value associated with every subscript, and hence to the way tables work in Icon. In practice, the Icon programmer picks a default value that is easy to test and that is not assigned as a value. The null value is typical. This splits the values associated with all possible subscripts into two parts -- those put into the table by the user and the rest. I agree that a test for membership would be more natural in many cases. But then I want sets, too. - Ralph Griswold From mmdf Tue May 24 21:54:43 1983 From: whm at arizona Date-Sent: 24 May 83 21:52:32 MST (Tue) Subject: Re: Tables To: icon-group@arizona Status: RO I believe that Spitbol 360 uses the table(m,n) form, does anyone know what the system actually does with the expansion argument? Concerning specification of table sizes, it's nice to be able to free the user from having to worry about that, but the fact is that is that it's hard to imagine an efficient table implementation that can adapt itself to widely varying table sizes. In Icon, if you have a table with 10,000 elements, the worst case scattering of hash values will necessitate 10,000 element accesses to add an element. In the best case, you get about 70 accesses to add an element. Doesn't seem too good to me. If the same table framework is maintained, adding a size argument to the table function doesn't seem that atrocious. The size argument would specify the expected number of elements in the table. I suppose the real issue is that of practicality. It might be nice to cook up some hot data structure that can easily manage tables of widely varying sizes, but if I was faced with an urgent need to have Icon handle very large tables, I'd add the size argument and let the user take some of the responsibility. On the other hand, it does seem that the internal representation of tables is an area that can stand some work. My first inclination would be to look at the various tree structures used by database systems to see if one of them could be used. Of course, the issue can be further complicated by considering what operations are most prevalent and trying to optimize things on that basis. What other languages (besides Icon and the SNOBOL4 family) have have table-like (i.e. associative) data structures? I'd imagine that SL5 has tables. I know that awk has tables, and I believe that bs (USG UNIX) has tables. Is anyone familiar with the implementation of tables in these languages? Bill Mitchell From mmdf Tue May 24 22:07:37 1983 From: whm at arizona Date-Sent: 24 May 83 22:06:06 MST (Tue) Subject: subscripts To: icon-group@arizona Status: RO An interesting thing to note is that if t[x] fails when x is not in the table t, it creates a bit of an inconsistency: If you say s := "hello" s[10] := "x" the reference to s[10] fails and the assignment is not performed. If you have an empty table t, and you say t[1] := "x" should t[1] fail? If it does, tables aren't going to be a very useful language feature. What's the alternative? Do some special casing of some type? No good, I think you'd just end up in a terrible mess. Bill Mitchell From mmdf Wed May 25 15:48:21 1983 From: Jerry Leichter Subject: Implementing tables efficiently To: icon-group.arizona@RAND-RELAY.ARPA Status: RO There are two general approaches to maintaining table-like structures of widely varying sizes efficiently, both of which have been in wide use in data base applications. The first, and usually easier approach is to update tables on a periodic basis. The ISAM file structures (to use the IBM terminology) do this: As items are added to and deleted from the file, access gets less and less efficient and space gets wasted. Every once in a while (week? day? - it depends on the application and can be quite hard to decide on) a utility program gets run which re-creates the file from scratch in some efficient form. The newly-created file has no gaps left from deleted items and has an index ("hash table") size computed to give good access. Usually, space is left for additional records, though this is an obvious place to provide controls - if I know the file will not grow by much, there is no reason to leave much space. The analogue for Icon (or other languages) is obvious: Either provide an explicitly-callable "cleanup table" function, or, much more in keeping with the philosophy of the language, do this as part of garbage collection. The goal of "cleaning up" a table is to (1) eliminate all unused space (deleted elements); (2) to re-hash, if necessary, into a larger or smaller table that optimizes (some measure of) the time/space tradeoff. Again, one can either have the user provide information like "this table will never grow again", or do something like keeping track of the changes since the last time the table was restructured and use that as a prediction. Periodic updating, exactly like periodic garbage collection, has a problem: Every once in a while, the whole system stops for a while to do it. For a language like Icon, this might be acceptable. For file systems, it has turned out not to be; hence, the evolution of a second alternative: Incremental up- dating. (IBM VSAM files) In incremental updating, the time/space tradeoff is evaluated on each reference to the file, and "small" restructurings are made as necessary. B-trees are a standard technique used here. Up until a few years ago, no incremental hashing techniques were known; then a whole slew of them were discovered in a very short time. (I can provide references if people are interested.) These could be adapted to tables, if desired. However, the incremental techniques, like incremental garbage collection techniques, tend to be somewhat less efficient, averaged over long periods of time, than the periodic algorithms - they trade predictability for efficiency. The original SIL SNOBOL implementation is a very simple incremental technique. A similar, but improved, technique could easily be used. Suppose we use a linked-bucket hash table. In the header of the table, maintain the lengths of each of the chains. (This can be done very cheaply by the insertion and dele- tion code.) Also maintain a pair of goals, LO and HI. When the average chain length - or the maximum, or some other measure - gets outside of the goals, re-hash everything into a new table with more (or fewer) buckets. In choosing the new number of buckets, a multiplicative factor should probably be used so that large tables, as they grow, don't get re-hashed to often. This is a very primitive approach, but will probably work quite well; all sorts of sophistica- tion can be added. As a point of interest, after using tables in SNOBOL for years, I've gotten to depend on them for various things to such an extent that I've implemented ana- logous structures in other languages I've used, C in particular. (The imple- mentation is typically a package of subroutines. I've ended up with a standard package that works out well: add new element, delete old element, replace old element (returning old value), and iterate through table, done as i = (say) -1; while((i = next(table,i)) != -1) { work with element pointed to in some known way}.) VERY handy! -- Jerry decvax!yale-comix!leichter leichter @ yale ------- From mmdf Wed May 25 15:54:52 1983 From: PAIGE@RUTGERS Subject: Re: Tables To: whm.arizona@RAND-RELAY, icon-group.arizona@RAND-RELAY In-Reply-To: Your message of 25 May 83 01:52:34 EDT Status: RO SETL implements dynamic heterogeneous sets and mappings. I believe, but not completely sure anymore, that each data type has its own special formula for computing its hash code. The formula for elementary data items is straightforward; for tuples (dynamic heterogenous vectors) the hash code is just the code for the first component; for sets the code is the exclusive or of the codes of all of the elements in the set. Thus, the hash code for a set can be recomputed cheaply when it is augmented or diminished -- in each case the new hash code is just the old code exor'ed with the element being added or deleted. SETL mappings are similar to tables. Mappings are access paths from a domain set to a range set, and are represented as sets of 2-tuples. However, SETL distinguishes between function and multivalued mapping operations. The notation f(a) indicates function retrieval and is undefined if f is not a set, if f is a set but contains no pair whose first component has the value a (i.e., a is not in the domain of f), or if f is not single valued at a. In all of those cases the value of the function retrieval is omega, the unique undefined atom. In order for the user to define a default value in lieu of omega for f, a macro or a procedure named 'f' would have to be used. The notation f{a} indicates multivalued map retrieval and is undefined only when f is not a set. If a is not in the domain of f, the value is the empty set, {}. I like the idea of having user defined dafault values for tables, but wonder why tables are not called mappings or functions. Also, the notation f[s] is used commonly in mathematics to denote the image of the set s under the mapping f, so I would prefer a different notation for simple table retrieval. Image set is not available in SETL, but it is very useful in specifying algorithms clearly and concisely. I agree that picking block size or number of hash bins runs contrary to the philosophy of high level languages. Has anyone considered an extensible hashing technique (see Fagin, Pippenger, Strong, Nievergelt in TODS # 3, 1979) or Universal Hashing due to Carter and Wegman? Bob Paige ------- From mmdf Wed May 25 15:58:00 1983 From: PAIGE@RUTGERS Subject: Re: subscripts To: whm.arizona@RAND-RELAY, icon-group.arizona@RAND-RELAY In-Reply-To: Your message of 25 May 83 02:06:08 EDT Status: RO I'm not convinced that your example exhibits an inconsistency. The assignment f[x] := y means something different depending on the type for f. When f is a table, one could argue that f is a mapping, and therefore, a set. The indexed assignment could indicate that all pairs whose first component has the value x are removed from f after which the pair [x,y] is added. This semantics is followed by SETL and is consistent with elementary set theory. So in this case, I don't think there should be failure. Whether or not failure occurs when f is a string and x is 10 could certainly depend on the semantics of strings. One could argue that strings are finite tuples with character components, so that when f is of length 5 one cannot extend it by placing a character in the 10th position. So a failure in this case is natural. Bob Paige ------- From mmdf Thu May 26 22:15:56 1983 From: whm at arizona Date-Sent: 26 May 83 22:14:20 MST (Thu) Subject: What's in a name To: icon-group@arizona Status: RO Every now and then I run into somebody that sees the name "Icon" and thinks that it has something to do with Xerox, Smalltalk, etc. This might be better aimed at the Usenet st80 group, but I was wondering if anybody might know when the people at Xerox first started using the term "icon". I believe that Icon (as in the language) dates from about 1978, but I'm not really sure. Perhaps Ralph could give a little background on how the name was selected and when. (<- subtle hint) While I'm here, let me point out that Icon is spelled I-c-o-n. In particular, note that it is not spelled I-C-O-N, or i-c-o-n. Nobody will roll over and die if you use "ICON", but "Icon" is correct. Bill Mitchell From mmdf Fri May 27 23:58:47 1983 From: trh at arizona To: icon-group@arizona Subject: efficiency Status: RO Here's a question for gurus: Is line A or line B more efficient? Why? Is it significantly so? procedure main() t := table() t[1] := "abc" t[2] := "def" t[3] := "ghi" every write(!t[1] || !t[2] || !t[3]) #line A every write(!t[1],!t[2],!t[3]) #line B end From mmdf Sat May 28 13:43:32 1983 From: ralph at arizona To: icon-group@arizona Subject: Icon Newsletter Status: RO The Icon Newsletter is published aperiodically, about three times a year. 11 issues have been published to date, the last in March. The next issue probably will come out in early July. The Newsletter, as its title indicates, contains news related to Icon. Typical topics include new implementations, documents, and projects. A continuing feature of the Newletters is a `Programming Corner' that poses questions and offers solutions to various aspects of programming in Icon. The Newsletter is distributed to interested persons, free of charge, via the Postal Service. There are presently about 500 subscribers. To subscribe, send your postal address to icon-project.arizona@rand-relay At the present time, back issues of Newsletters #2-11 are available on request to new subscribers. Ralph Griswold From mmdf Sun May 29 09:10:29 1983 From: ralph at arizona To: icon-group@arizona Subject: Survey Results Status: RO We have received 24 responses to the survey of group proficiency and interests in Icon. (There currently are about 70 icon-group addresses, 7 of which are addresses for redistribution on other machines.) The responses are summarized below. Question: What is your familiarity with Icon? -- on a scale of 1 to 10, graded roughly as follows: 1 -- have heard of Icon, but that's about all . . . 5 -- generally knowledgeable about Icon and competent to formulate simple programs . . . 10 -- Icon wizard Responses: 1: xxxxxxxx 2: 3: xx 4: xxxx 5: xxx 6: xxx 7: x 8: 9: x 10: xx Note: the 9s and 10s are all "locals" at Arizona. Question: What discussion topics most interest you? Responses: The responses were varied in format and specificity. They have been categorized below as accurately as possible, but some may have been misinterpreted. idioms and programming techniques 16 general applications 8 language design issues 6 implementation issues 6 theoretical aspects 6 program exchange 5 comparison with other languages 3 portability 3 program design 3 AI applications 2 bugs 2 documentation 2 history 2 implementation news 2 string manipulation 2 tools in or for Icon 2 wish lists 2 efficiency 1 Icon as an embedded language 1 programming environments for Icon 1 - Ralph Griswold From mmdf Wed Jun 1 18:38:26 1983 From: David Vinayak Wallace Reply-To: Gumby at MIT-MC To: icon-group.arizona@Rand-Relay Status: RO Does anyone know of an implementation on Icon which runs under TOPS-20? Any pointers would be appreciated. david From mmdf Fri Jun 3 13:55:20 1983 From: whm at arizona Date-Sent: 3 Jun 83 13:53:07 MST (Fri) Subject: Icon for TOPS-20 To: gumby@mit-mc Cc: icon-group@arizona Status: RO There is a Version 2 Icon system available from us that runs under TOPS-10. It is a Ratfor implementation, so in addition to V2/V5 language differences, it is not very fast. This system has been brought up under TOPS-20. There were allegedly some minor problems related to i/o on TOPS-20, but we don't have the details at hand. This system is available from us via tape (BACKUP format) for no charge, or for $15, we'll supply a tape. There has been some interest in porting the C implementation to 10's and 20's, but the problem of sizeof(char *) not being equal to sizeof(int *) may cause severe problems. We also aren't aware of a 10 or 20 C compiler that supports assignment and call-by-value for structures, and Icon uses quite a bit of that. Bill Mitchell From mmdf Fri Jun 3 14:21:26 1983 From: whm at arizona Date-Sent: 3 Jun 83 14:19:01 MST (Fri) Subject: write(a,b) vs. write(a||b) To: icon-group@arizona Status: RO Regarding the question of whether it's more efficient to say write(s1,s2,...,sn) [1] or write(s1||s2||...||sn) [2] The former is more efficient and is the suggested way to accomplish the task. The write function deals with each argument in turn, writing it to the output file. If an argument is a string, it can work with it directly, otherwise it converts it to a string and then writes it. In [2], Icon computes t := s1 || s2 t := t || s3 ... t := t || sn and then does write(t) For each concatenation, a new string is allocated for the result. So, in addition to performing the concatenation operation itself, there is garbage collector overhead. In [1], no operations are performed, and there is no storage allocation unless some of the si need to be converted to strings. So, the general answer is that [1] is a big winner. However, we tried some timings and found that for short strings, in particular, the example given by trh@arizona, the performance of the two is about the same. This seems to indicate that the time consumed by concatenation and allocation of intermediate results is insignificant in relation to other activities taking place. Bill Mitchell p.s. That was a good question, I remembering wondering the same thing at one time. From mmdf Fri Jun 3 23:40:28 1983 From: Guy.Steele@CMU-CS-A Subject: Icon forTOPS-20 To: icon-group.arizona@rand-relay Status: RO Thank you for your message. I might be interested in looking at the problem of porting a C version to the -20. Also, I would be interested in a C version for the VAX. (By the way, the header on the mail I received claimed it was sent to "GUMBY@MC", who is definitely not me. Strange.) --Thanks, Guy Steele From mmdf Sat Jul 16 04:56:19 1983 Date: 16 Jul 1983 04:54:53-PST From: whm at arizona Date-Sent: 16 Jul 83 04:54:52 MST (Sat) Subject: Icon on USG UNIX To: icon-group@arizona We would like to know if anyone has installed Icon on a USG (e.g. System III or System V) UNIX system. We thought that someone had probably already done this and had encountered no problems, but we have just dealt with a stream of problems from someone trying to install Icon on a System V VAX. So, if you've already hacked through an Icon installation on a USG system, we'd like to hear from you. While I'm here, I'll give an unofficial report on the status of our "Icon on a 68000" project. We appear to have located a 68000 system (a Pixel) whose management will let us use to port Icon to. The Pixel is currently waiting on software but as soon as that arrives we hope to begin a port. Bill Mitchell From mmdf Mon Aug 1 21:44:27 1983 From: whm at arizona Date-Sent: 1 Aug 83 21:42:15 MST (Mon) Subject: Version 5.8 of Icon To: icon-group@arizona Status: RO Version 5.8 of Icon is now being distributed. The new version corrects several bugs in 5.7 and also has a link directive that specifies ucode (.u1 and .u2) files to be included when linking. For example, link funcs1,"../lib/io" in an Icon program causes the corresponding ucode files to be included when the program is linked. A search path can be used to cause the linker to check several directories in turn for a named file. Included with Version 5.8 is the Icon Program Library which consists of Icon programs, procedures, and external functions. The library is quite diverse and covers a wide range of application areas. A document, TR 83-6, describing the library in UNIX manual style is available on request. Work has been done on the source code for Version 5.8 to make it easier to port to UNIX environments. A document, "Porting the UNIX Implementation of Icon" (TR 83-10), which endeavors to illuminate the porting process, is available on request. In addition, a small test suite has been prepared to be used in conjunction with TR 83-10. Version 5.8 can be configured for VAX or PDP-11 (split I/D) UNIX systems. It has been tested under 4.1bsd and V7. USG UNIX systems have been taken into consideration and while we know of no Icon systems running under USG UNIX, we believe that Icon could be brought up under one with little effort. To obtain Version 5.8, send a check for $15 or tape at least 600 feet long to: Icon Project Department of Computer Science University Computer Center The University of Arizona Tucson, AZ 87521 Please indicate (where appropriate): Name Address Telephone Electronic mail address Desired tape density (800 or 1600 bpi) If hardcopy and software related to porting Icon are desired, please indicate that as well. The following new documents are available in hardcopy form: Implementations of Icon Differences Between Versions 2 and 5 of Icon (TR 83-5) The Icon Program Library (TR 83-6) Porting the UNIX Implementation of Icon (TR 83-10) Co-Expressions in Icon, reprinted from Computer Journal Programmer-Defined Control Operations, reprinted from Computer Journal To request documents, send a letter via the postal service to the above address, or send a message to icon-project.arizona@rand-relay. Be sure to include a return postal address. Icon Newsletter #12 has been mailed. Persons on the Newsletter mailing list should receive a copy soon. This Newsletter contains forms for requesting Version 5.8 of Icon and the documents listed above. Bill Mitchell From mmdf Tue Aug 2 18:41:22 1983 From: glenn@Uwisc (Glenn Blank) Subject: Version 5.8 of Icon Reply-To: glenn@Uwisc To: icon-group.arizona@Rand-Relay, whm.arizona@Rand-Relay Status: RO Please send me your document on The Icon Program Library, TR 83-6. My address is: Glenn D. Blank 101-I Eagle Heights Madison, WI 53705 Thanx. From mmdf Wed Aug 17 18:44:23 1983 From: Charlotte Mooers Subject: Please add me to the Icon-Group mailing list. To: Icon-Group%Arizona@Rand-Relay Cc: Mooers@BBN-UNIX Status: RO Thanks. Charlotte . From mmdf Wed Aug 17 18:46:49 1983 From: Charlotte Mooers Subject: My previous request to be added to Icon-Group To: Icon-Group%Arizona@Rand-Relay Cc: Mooers@BBNA Status: RO Better send the messages to MOOERS@BBNA. Thanks. ---Charlotte From mmdf Mon Aug 29 18:42:19 1983 From: Robert E. Wells Subject: Equality for reals... To: icon-group.arizona@rand-relay Cc: rwells@bbn-unix Status: RO Equality comparisons on real numbers are notoriously untrustworthy on most systems. Numbers which print out identically, and which were produced by intuitively equivalent operations, may differ in the least significant bits of the mantissa. Has anyone considered incorporating range checking into the equality test itself? I have in mind defining n = m for real n or m to be equivalent to abs(n - m) < &epsilon where &epsilon is a keyword, initially set to some very small positive value. The issue of value comparison is discussed in section 10.1 of the Icon manual; it gives the example that (1 - 1) = (2 - 2) is true due to the uniqueness of integer numbers. I am much less certain that (3.5 - 2.5) = (17.5 - 16.5) is true, even though that is perhaps implied in section 10.1. The &epsilon scheme is a way of making real numbers act more the way you expect. I would expect that n === m would be defined to succeed only if the real values n and m were exactly the same, having the same bit pattern. I am considering making this change to RPL, the command and programming language for RS/1, a decision support system for research scientists. RPL is similar to Icon in having variables that can have values of many different types and providing appropriate conversions between them; it is used more for numeric and statistical applications than for text processing. We have had several calls lately from users who not unreasonably expect the real numbers in their programs to behave like mathematical real numbers; after explaining some of the subtleties of floating point represention several times, I have started wondering if I could avoid some calls by making 'reals' act more like real numbers. Do programmable calculators address this issue? I would appreciate your views, particularly if there is a fatal flaw. -Thanks, Robert From sdcsvax!davidson Tue Aug 30 07:38:12 1983 From: Greg Davidson Subject: Equality for reals To: icon-group.arizona@rand-relay Status: RO When doing floating point calculations by hand, one commonly maintains a precision associated with each value, modifying it appropriately as one performs calculations. If this could be automated appropriately, it would solve several problems, including the equality question, but also the question of how many digits to show when printing real values. If there are no mathematical problems with doing this, the precision could be determined on input, by the number of digits used to express the quantity. An obvious optimization is to use a single precision value for collections of floating point numbers, where possible. Well numerical analysts, is this possible? -Greg P.S. I realize I'm guilty too, but do we have to call them real numbers when they aren't? Lets call them floats! From mmdf Tue Aug 30 17:24:53 1983 From: greep@SU-DSN Subject: Re: Equality for reals... To: Robert E. Wells Cc: icon-group.arizona@RAND-RELAY In-Reply-To: Your message of 29 Aug 1983 14:06:26 EDT (Monday). Status: RO Some people may be surprised at an equality relation which is non-transitive. From mmdf Tue Aug 30 22:34:09 1983 From: whm at arizona Date-Sent: 30 Aug 83 22:30:51 MST (Tue) Subject: Re: Equality for reals... To: icon-group@arizona Status: RO The root of the problem is that computers can't exactly represent floating point numbers, they just represent approximations of them. That's pretty obvious, but less obvious is that floating point operations aren't always associative. Because of the approximate nature of floating point math, having an approximate comparison seems reasonable to me. A problem I see with &epsilon is that of picking the initial value of it. Are there other languages with a similar concept? (APL?) How do they pick their "little" value? Calculators seem to be pretty smart about operations with real numbers; is that because they use lots of bits in their number representations? This is a bit off track, but in Icon, x === y is just like x = y if x and y are real. The equivalence operation is most often used (by me) for things like x === 1 to see if x is the integer 1 without first having to see if x is an integer, e.g. integer(x) = 1 Avoiding real numbers, Bill Mitchell From mmdf Wed Aug 31 01:27:08 1983 From: trh at arizona Date-Sent: 31 Aug 83 01:23:35 MST (Wed) To: icon-group@arizona Status: R In response to rwells@bbn-unix: Your keyword &epsilon seems to be a close cousin of APL's Quad-CT system variable. Assignment to Quad-CT allows the user to specify the "comparison tolerance"; the proportion by which two value may differ and still be considered "equal". A good discussion of this is found in Chapter 14 of the Sharp APL Reference Manual by Paul Berry of I. P. Sharp Associates. You may wish to look into this as you implement your own version. Good luck. -tom hicks (trh@arizona). From mmdf Wed Aug 31 07:40:06 1983 From: Guy.Steele@CMU-CS-A Subject: Equality for reals To: icon-group.arizona@rand-relay Cc: rwells@bbn-unix Status: R Defining n = m for floating-point n and/or m to be equivalent to abs(n - m) < &epsilon is not a very good idea. For very small m and n, the difference may always be less than &epsilon, and for very large m and n, no matter how close together, the difference may always be greater than &epsilon. A somewhat better definition is abs(n - m) < &epsilon * max (abs(m), abs(n)) I recommend that you take a look at what APL does, inasmuch as APL provides such "fuzzy" comparisons, and APL implementors have worried a great deal over the proper definition. See the various APL conference proceedings, in particular part 2 of the APL '79 conference, which is a complete proposed standard APL definition. --Guy Steele From mmdf Wed Aug 31 07:50:41 1983 From: David Vinayak Wallace Subject: Equality for reals... To: whm.arizona@Rand-Relay Cc: icon-group.arizona@Rand-Relay Reply-To: Gumby@mit-mc In-Reply-To: Msg of 31 Aug 1983 02:30-EDT from whm.arizona at Rand-Relay Status: R Date: Wednesday, 31 August 1983 02:30-EDT From: whm.arizona at Rand-Relay ...but less obvious is that floating point operations aren't always associative. Because of the approximate nature of floating point math, having an approximate comparison seems reasonable to me. A problem I see with &epsilon is that of picking the initial value of it. Are there other languages with a similar concept? (APL?) How do they pick their "little" value? Isn't this is architecture- (i.e. word-length-) dependent? Shouldn't it be one half of the roundoff error? Calculators seem to be pretty smart about operations with real numbers; is that because they use lots of bits in their number representations? My HP uses BCD to 14 significant places, and then displays 11. You don't notice problems unless you do a LOT of remultiplies. What about using rationals (i.e. carry out the numerator and denominator as long as possible)? After all, irrationals don't exist on a machine with a finite word length. david From mmdf Wed Aug 31 18:40:57 1983 From: Andy Cromarty Subject: floating-point number comparisons To: icon-group.arizona@rand-relay Status: R I have worked in several LISP dialects wherein "flonum" equality is approximate. A function named "eqn" was defined such that (eqn f1 f2) was true just in case (lessp (abs (difference f1 f2)) *fuzz), i.e. just in case the (absolute value of the) difference between the floating point numbers was less than some standard "fuzz-factor" variable. The variable had a default value more or less consistent with the architecture of the machine but was changeable by the user. (There are attendant pitfalls, of course, such as setting it absurdly large or setting it to different values in different contexts, but that sort of flexibility is arguably a part of the LISP philosophy.) Note that setting *fuzz to 0.0 enforces "strict equality", insofar as the machine allows. The note about the non-transitive nature of equality comparisons under such a scheme bears some scrutiny. As a rule, contemporary architectures do not guarantee transitivity of equality once algebraic transformations are taken into account, and when they do it is pathological -- for example, zero often is "more equal to itself" than other values due to its ubiquity and precise representation in many machines. An explicit treatment of this pathology by the language designer simply allows the programmer to control the interference induced by the architecture to some extent rather than uniformly victimizing him/her by it. asc From mmdf Wed Aug 31 23:45:52 1983 From: hilfingr%ucbkim@Berkeley (Paul Hilfinger) Subject: Approximate equality of reals To: icon-group.arizona@rand-relay Status: R I've consulted The Expert on this subject, with the following results: First, I think I know what Robert Wells meant by saying that ``Equality comparisons of real numbers are notoriously untrustworthy,'' but I'm not positive. Let me just throw out some facts to be sure; pardon me if they're obvious. The floating point equality operator is absolutely trustworthy; x=y iff x and y are exactly the same quantity. The problem is that because the various other floating point operators yield approximations to what we ``really intend''--to the mathematically correct answers--the result of comparing their results with each other don't always yield what we really intend. By the way (re: comment by Greg Davidson), they most certainly ARE real numbers! The problem is that +, -, *, /, etc., don't have their mathematical meanings. The situation was clearer in Bliss, e.g., where the floating point operators had a different appearance. All right, so much for trivial quibbling. Actually, the first quibble isn't so trivial. There are cases in which it is, in fact, essential that = really mean = (that is, strict equality of the computed operands). If you want, I can bug my Expert for an example. In other words, strict equality of real quantities is a computationally legitimate operation, and since one is going to need it from time to time, it seems confusing at best to usurp the standard notation for it, especially when the replacement has such radically different semantics. For clarity, the proposed comparison should be given a distinctive syntax (it's unfortunate that ~= has been pre-empted in Icon; that would have been my choice.) Next, we come to the real substantive difficulty. There simply is no sensible general definition of ``nearly equal.'' Sometimes one might want ``absolute difference less than epsilon'' (and epsilon might vary). For other applications one might need ``relative difference less than epsilon.'' There are still more exotic metrics that make perfect sense for certain applications. All of these have an equal right to the title ``approximate equality'' and all are totally useless for some applications. One can, of course, define the syntax for ``nearly equal'' and then allow the user to supply his own definition (perhaps with a default definition supplied by the system). This bothers me as a language designer. Either one has a language that allows user definition of operators--in which case you needn't make any special provision for defining a ``nearly equal'' operator--or you have a language that does not allow such user definitions--in which case the ``nearly equal'' operator would be the only operator whose definition was set by the user and could not be relied upon to have the same meaning from program to program. The APL route (a fixed definition of ``near equality'' that is parameterized by epsilon) has its own additional problems; setting appropriate values of epsilon at the right points is quite clumsy, not to mention the oddity of a harmless-looking comparison operator whose definition depends on a global variable. In practice, APL's definition of equality is hard to use correctly. The idea of doing ``significance arithmetic'' (mentioned by Greg Davidson) has been tried before. Unfortunately, it can be horribly conservative in some cases (reporting that a certain quantity has only 3 significant figures when it really has 6, for example.) A big part of the problem is that saying that something has n significant digits (even binary digits) is a very course measure of its accuracy, causing one to overstate the errors involved. Another technique is interval arithmetic. Again, this can give overly-conservative results. Of course, both of these techniques cost performance. To summarize: (1) strict equality comparison of real quantities is a legitimate operation; therefore, there seems to be no reason to redefine the standard notation `='; (2) there seems to be no definition for ``nearly equal'' that is widely enough useful to justify its definition as a standard operator. --Paul Hilfinger From mmdf Wed Sep 14 17:38:31 1983 From: whm at arizona Date-Sent: 14 Sep 83 17:32:35 MST (Wed) Subject: Icon for VAX/VMS To: icon-group@arizona Status: R Version 5 of Icon is now available for VAX/VMS. VMS Icon was developed by Bob Goldberg with support from Datalogics, Inc. and is being distributed by the University of Arizona. VMS Icon does not include the Icon compiler; it consists of only the interpretive system. The distribution includes executable files, object modules, sample programs, and the procedures from the Icon book. The executable files were produced on Version 3.2 of VMS. Those with VAX-11 C should be able to rebuild Icon from the object modules should the executable files should happen to not work on a particular version of VMS. We do not have the facilities to rebuild the system and thus, it is being distributed on "as-is" basis, i.e., if problems are encountered, it may not be possible for us to provide a solution to them. To obtain a copy of the system, specify the following where applicable: Your Name Address Organization Telephone Network Address Desired tape density (800 or 1600 bpi) and send a tape at least 600 feet long (or check/money order for $15 payable to The University of Arizona) to: Icon Project Department of Computer Science The University of Arizona Tucson, AZ 85721 Please direct questions to icon-project.arizona@rand-relay Bill Mitchell From mmdf Fri Sep 23 00:39:33 1983 From: whm at arizona Date-Sent: 23 Sep 83 00:33:58 MST (Fri) Subject: Icon on System III or V UNIX? To: icon-group@arizona Status: R If you know of an Icon system running on System III or System V, please drop me a note. Thanks, Bill Mitchell From mmdf Tue Oct 18 23:21:42 1983 From: whm at arizona Date-Sent: 18 Oct 83 23:19:18 MST (Tue) Subject: s/x/y/g in Icon To: icon-group@arizona Status: R Something I've had often occasion to do in Icon is to change all occurrences of one string to another. I'm always fumbling around over how to do it. I was wondering if anybody has any "neat" ways to do this. More specifically, write change(line,pat,rep) that changes all occurrences of the string 'pat' in the string 'line' to the string 'rep' and returns the new string. If you've got a "neat" (you decide on what's "neat") way to do it, post it to the group. Bill Mitchell From mmdf Thu Oct 20 17:39:35 1983 From: jacobson@wisc-rsch (Fred M Jacobson) Subject: change(line, pat, rep) To: icon-group.arizona@rand-relay Cc: jacobson@wisc-rsch Status: R Bill- Here's my "neat" solution. I am taking "neat" to mean "tidy, orderly, clean", "simple, plain, unadorned", and "well-designed, well-worked-out", not "great, wonderful, marvelous" or "clever, ingenious". (See The Synonym Finder by J.I. Rodale, Rodale Press, 1978, a dictionary thesaurus.) procedure change(line, pat, rep) # Change every occurrence of the string `pat' in the string `line' # to the string `rep', and return the changed string. local n, result, p (line := string(line) & pat := string(pat) & rep := string(rep)) | fail n := *pat result := "" while p := find(pat, line) do { result ||:= line[1+:p-1] || rep line := line[p+n:0] } return result || line end The "interesting" part of the problem is the "find" function. A section of Alfred V. Aho's paper, Pattern Matching in Strings, discusses efficient algorithms. I have a copy of the paper, but no citation. Aho presented the paper in December 1979. I assume it appeared in 1980 or 1981, perhaps in the JACM. -Fred From mmdf Fri Oct 21 19:35:04 1983 From: Mark Feldman Return-Path: Subject: change (line, pat, rep) To: icon-group.arizona@rand-relay Status: R Fred's solution to the change problem is neat, but it could be neater. The one problem I have with it is that it changes the original line. A function such as change should return a new string, leaving the original as it was. Here's my version: procedure change (line, pat, rep) # returns a string consisting of with all occurrences of # changed to local patlen, place, start, result (line := string (line) & pat := string (pat) & rep := string (rep)) | fail start := 1 patlen := *pat result := "" while place := find (pat, line, start, 0) do { result ||:= line [start:place] || rep start := place + patlen } result ||:= line [start:0] return result end This version should also be a bit more efficient since it keeps track of where it is in line by using a counter and not by taking substrings. Someone once said (or should have) "The best algorithm is only as efficient as its implementation." So people, let's be careful out there... (eh?) Mark Feldman UUCP : {seismo,allegra,brl-bmd}!umcp-cs!feldman CSNet: feldman@umcp-cs Arpa : feldman.umcp-cs@CSNet-relay From mmdf Fri Oct 21 20:11:58 1983 From: whm at arizona Date-Sent: 21 Oct 83 20:07:03 MST (Fri) Subject: change(...) To: icon-group@arizona Status: R Here's what I came up with: procedure change(line,pat,rep) rx := 0 every p := find(pat,line) do { line[(p+rx)+:*pat] := rep rx +:= *rep - *pat } return line end Note that the size calculations can be moved out of the loop. I guess that the "trick" here (if any) is the in-place substitution. Also note that although line is being modified in the procedure, the value used as the parameter is not_ changed. You may have noticed that the loop can be replaced by: every line[(find(pat,line)+rx)+:*pat] := rep do rx +:= *rep - *pat I don't know which of the three versions submitted would be the fastest; if I get a chance I'll try some timings. Any more solutions? How about something recursive? Bill Mitchell From mmdf Sun Oct 23 07:41:11 1983 From: Mark Feldman Return-Path: Subject: Bug in my change To: icon-group.arizona@rand-relay Status: R Hey guys, there's a bug in my change procedure. It seems that if is the null string my procedure loops (infinitely). Adding the line pat ~== "" | return line before the main loop remedies the problem. Here's the correct version: procedure change (line, pat, rep) # returns a string consisting of with all occurrences of # changed to local patlen, place, start, result (line := string (line) & pat := string (pat) & rep := string (rep)) | fail pat ~== "" | return line start := 1 patlen := *pat result := "" while place := find (pat, line, start, 0) do { result ||:= line [start:place] || rep start := place + patlen } result ||:= line [start:0] return result end Sorry for sending something that wasn't completely tested. It will never happen again :-). Mark Feldman UUCP : {seismo,allegra,brl-bmd}!umcp-cs!feldman CSNet: feldman@umcp-cs Arpa : feldman.umcp-cs@CSNet-relay From mmdf Sun Oct 23 18:40:32 1983 From: Mark Feldman Return-Path: Subject: change (line, pat, rep) To: icon-group.arizona@rand-relay Status: R I have been having a problem with mail so I am resending my original solution to the change problem as well as a newer recursive version. My original solution is based on Fred's "neat" solution. I modified change so that it keeps track of its location in line with a counter instead of taking substrings. This keeps the original line the same and increases speed since integer math is usually faster then string manipulation. So here it is: procedure change (line, pat, rep) # returns a string consisting of with all occurrences of # changed to local patlen, place, start, result (line := string (line) & pat := string (pat) & rep := string (rep)) | fail pat ~== "" | return line start := 1 patlen := *pat result := "" while place := find (pat, line, start, 0) do { result ||:= line [start:place] || rep start := place + patlen } result ||:= line [start:0] return result end Here's a recursive version of change. And while it may be "neater", it's also slower: procedure change (line, pat, rep) # a recursive procedure to return with all occurrences of # changed to local place (line := string (line) & pat := string (pat) & rep := string (rep)) | fail line ~== "" | return line pat ~== "" | return line if place := find (pat,line) then return line [1+:place-1] || rep || change (line [place+*pat:0],pat,rep) else return line end -- Mark Feldman UUCP : {seismo,allegra,brl-bmd}!umcp-cs!feldman CSNet: feldman@umcp-cs Arpa : feldman.umcp-cs@CSNet-relay From mmdf Mon Oct 24 02:34:21 1983 From: whm at arizona Date-Sent: 24 Oct 83 02:27:17 MST (Mon) Subject: tables To: icon-group@arizona Status: R I'm in need of some background information for a report concerning Icon tables and table programming techniques. An Icon table allows the association of two arbitrary data values. It appears that the first language to support such a capability in any form was Lisp with "association lists". It also appears that SNOBOL4 was the first language to support "tables" as a separate datatype. Can anyone offer evidence to the contrary? (I'm primarily concerned about COMIT and IPL V since I know nothing about them.) Also, I know that the following languages support some datatype similar to Icon tables: AWK, SETL, and SNOBOL4. Are there others? Bill Mitchell From mmdf Mon Oct 24 18:42:56 1983 From: Mark Feldman Return-Path: Subject: change (one more time) To: icon-group.arizona@rand-relay Status: R Here I go again. This better get through. Sorry for all the trashed mail, but I think I've found my mail problem. If my letter doesn't get through this time, then the problem isn't with me (either that or I have chronic problems :-) ) I have been having a problem with mail so I am resending my original solution to the change problem as well as a newer recursive version. My original solution is based on Fred's "neat" solution. I modified change so that it keeps track of its location in line with a counter instead of taking substrings. This keeps the original line the same and increases speed since integer math is usually faster then string manipulation. So here it is: procedure change (line, pat, rep) # returns a string consisting of with all occurrences of # changed to local patlen, place, start, result (line := string (line) & pat := string (pat) & rep := string (rep)) | fail pat ~== "" | return line start := 1 patlen := *pat result := "" while place := find (pat, line, start, 0) do { result ||:= line [start:place] || rep start := place + patlen } result ||:= line [start:0] return result end Here's a recursive version of change. And while it may be "neater", it's also slower: procedure change (line, pat, rep) # a recursive procedure to return with all occurrences of # changed to local place (line := string (line) & pat := string (pat) & rep := string (rep)) | fail line ~== "" | return line pat ~== "" | return line if place := find (pat,line) then return line [1+:place-1] || rep || change (line [place+*pat:0],pat,rep) else return line end -- Mark Feldman UUCP : {seismo,allegra,brl-bmd}!umcp-cs!feldman CSNet: feldman@umcp-cs Arpa : feldman.umcp-cs@CSNet-relay From mmdf Sun Oct 30 23:41:56 1983 From: suz@LBL-CSAM (Suzanne O'Dell[Tourist]) Return-Path: Subject: Moved To: whm.arizona@Rand-Relay Cc: icon-group.arizona@Rand-Relay Status: R I have moved to the Washington DC area and I would like to keep receiving the Icon newsletter. I still have an account at LBL for electronic mail (for now). Will you be at the Wash. DC UNIX conference in January? I am still in the job hunting process but I will have to send for an Icon tape when I have a computer to put it on. (On job applications with multiple choice boxes for "what languages do you know" I write in Icon under "others" and that gets them every time.) New address for newsletter: Suzanne O'Dell 13221 Parson Lane Fairfax, VA 22033 (703) 378 8574 Thanks, Suz