From isidev!nowlin@uunet.uu.net Sat Oct 26 07:17:16 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 26 Oct 91 07:17:16 MST Received: from relay1.UU.NET by optima.cs.arizona.edu (4.1/15) id AA09309; Sat, 26 Oct 91 07:17:14 MST Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA17544; Sat, 26 Oct 91 10:17:11 -0400 Date: Sat, 26 Oct 91 10:17:11 -0400 From: isidev!nowlin@uunet.uu.net Message-Id: <9110261417.AA17544@relay1.UU.NET> Received: from isidev.UUCP by uunet.uu.net with UUCP/RMAIL (queueing-rmail) id 101647.2800; Sat, 26 Oct 1991 10:16:47 EDT To: uunet!cs.arizona.edu!icon-group@uunet.uu.net Subject: Re: makefiles for Icon I didn't wade through every bit of the code Anthony had for generating makefiles. I liked the part that searched for link statements and adds those files that another file links as dependencies for it. I have this simple high level makefile file that I pull into other makefiles with the make include statement: ICONT = icont TFLAGS = -c LFLAGS = IBIN = . .SUFFIXES: .SUFFIXES: .u1 .icn .icn: $(ICONT) $(LFLAGS) -o $(IBIN)/$@ $< .u1: $(ICONT) $(LFLAGS) -o $(IBIN)/$@ $< .icn.u1: $(ICONT) $(TFLAGS) $< If I just name this file "makefile" the suffix rules it contains allow me to execute: make foo and if there's a file called foo.icn in the current directory, make knows enough to handle everything. Anthony's stuff would be nice for programs that require several ucode files to make an icode file. I just use explicit dependency lines for these: menu2: menu2.u1 welcome.u1 io.u1 $(ICONT) menu2 Due to the suffix rules in the earlier makefile this dependency line causes the correct .u1 files to be automatically built from the corresponding source files. The welcome and io ucode files are "link"ed into menu2 and Anthony's tool would be useful for detecting this. The right suffix rules make makefiles much simpler to write and maintain. Of course I usually use a modified version of this makefile that invokes isicont :-) --- --- | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!nowlin --- --- From icon-group-request@arizona.edu Fri Nov 1 05:49:17 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 1 Nov 91 05:49:17 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Hopey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA24489; Fri, 1 Nov 91 05:49:15 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Fri, 1 Nov 1991 05:48 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA23886; Fri, 1 Nov 91 03:38:56 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Fri, 1 Nov 1991 05:49 MST Date: 28 Oct 91 23:32:07 GMT From: csus.edu!wupost!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!midway!ellis!goer@ucdavis.ucdavis.edu (Richard L. Goerwitz) Subject: regular expression in Icon Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <97291E97A880533B@Arizona.edu> Message-Id: <1991Oct28.233207.1119@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago I've been mulling over the problem of regular expressions and Icon for some time now. I can't really decide on how they should be implemen- ted. Has anyone else thought on this question? (Obviously the answer is "yes," so I'm basically just asking for input.) A while back I wrote a regular expression handler for Icon that used a NFA to accomplish something halfway between what Icon's find() and UNIX's egrep do. This program is now in the IPL. Trouble is that it is slow. As I mentioned when I posted test versions of it, I tried to use a DFA, but I just couldn't figure a way to do this in Icon and still keep the automaton small and fast. I even tried egrep-like optimizations, which create only that portion of the atomaton that is needed (i.e. it compiles it incrementally as needed). This still didn't work. I wasn't happy with that effort, at least on the implementation level. The idea of integrating regular expressions into Icon, though, without using a new data type, seems good, though. Does anyone disagree? I'd like to hear variant opinions. If anyone running a SYSV system would like to try some extensions to the Icon run-time system, I have an implementation of find and match on hand that take regular expressions for arg1. They seem to work just fine, and I can't find any memory leaks in the buffering system I hacked together. Installing them would require slight modifications to three of the Icon source files and a recompilation. I'll be happy to send a copy to anyone who feels adventurous. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From reid@vtopus.cs.vt.edu Fri Nov 1 06:07:56 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 1 Nov 91 06:07:56 MST Resent-From: reid@vtopus.cs.vt.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA25065; Fri, 1 Nov 91 06:07:54 MST Received: from vtopus.cs.vt.edu by Arizona.edu with PMDF#10282; Fri, 1 Nov 1991 06:07 MST Received: by vtopus.cs.vt.edu (5.57/Ultrix3.0-C) id AA00166; Fri, 1 Nov 91 08:07:06 -0500 Resent-Date: Fri, 1 Nov 1991 06:07 MST Date: Fri, 1 Nov 91 08:07:06 -0500 From: reid@vtopus.cs.vt.edu (Thomas F. Reid) Subject: Ralph Griswold Tutorial on Icon Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Cc: reid@vtopus.cs.vt.edu Resent-Message-Id: <99C2D3A8A0403442@Arizona.edu> Message-Id: <9111011307.AA00166@vtopus.cs.vt.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu X-Vms-Cc: reid@vtopus.cs.vt.edu Ralph will give a one day tutorial on Icon in the Washington, DC area at the University of Maryland Adult Education Center on Friday, November 22nd. An announcement is attached. Please feel free to call/write me if there is anything I can do to help you attend. ---------------------- Attachment ------------------------ The Washington, DC Chapter of ACM Professional Development Committee presents over 20 one-day tutorials a year taught by many of the world's top computer scientists. During the week of November 18-22, 1991, the PDC will offer 10 tutorials. Of special interest is the tutorial on November 22 on the Icon programming language given by Dr. Ralph Griswold, Regents' Professor of Computer Science at The University of Arizona. Dr. Griswold is Regents' Professor of Computer Science at The University of Arizona. He has nearly 30 years of experience in the design, implementation, and use of high-level programming languages. He started his work at Bell Laboratories, where he co-authored the SNOBOL series of programming languages for string manipulation. Since 1971, he has continued his programming-language work at Arizona, concentrating on facilities for non-numerical computation. He is author or co-author of six books on programming languages and their implementation. Other tutorials given the same week as Dr. Griswold's are: Object-Oriented Program Design Using C++ by David Bern and Peer Reviews - Theory and Practice by Richard Cohen on Monday, November 18; Code Metrics and Design Metrics by Dr. Wayne M. Zage and Software Reverse Engineering by Dr. Hasan Sayani on Tuesday, November 19; Applying Statistical Process Control to Software Development by Barba Affourtit and Software Repository and Bridge Technology by Dr. Robert Arnold on Wednesday, November 20; Information Modeling by Dr. Cy Svoboda and Unit Testing During Maintenance by Thomas Bogart on Thursday, November 21; and Software Maintenance Technology by Nicholas Zvegintzov on Friday, November 22. The tutorials will be given at the University of Maryland Center of Adult Education in College Park, Maryland. Registration can be by mail or telephone. For a brochure describing all of the courses, please call the PDC's answering machine at 202-462-1215. Phone registration by credit card can be made by calling Ms. Eliane Van Ty Smith at 301-299-4286. Prices for check or credit card are $170 for Washington DC ACM chapter members and $175 for non-members by November 4; $205 after November 4. Purchase order are $230. Full-time students and senior citizens are $60. If you have questions, please call the PDC answering machine at 202-462-1215 or internet Tom Reid at reid@vtopus.cs.vt.edu or call at (703)698-4712.  From icon-group-request@arizona.edu Fri Nov 1 08:36:41 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 1 Nov 91 08:36:41 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA28901; Fri, 1 Nov 91 08:36:39 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Fri, 1 Nov 1991 08:36 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA01253; Fri, 1 Nov 91 07:19:33 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Fri, 1 Nov 1991 08:36 MST Date: 29 Oct 91 07:37:44 GMT From: ntrlink!mips!zaphod.mps.ohio-state.edu!caen!uvaarpa!murdoch!usenet@decwrl.dec.com (Steven D. Majewski) Subject: Approximate string matching Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: Message-Id: <1991Oct29.073744.13155@murdoch.acc.Virginia.EDU> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Virginia I am implementing the Levenshtein string-edit distance in Icon. ( number of insertions, deletions & replacements. Note: there is nothing about THIS algorithm especiall suited to ICON, but there is going to be pleanty of other string/symbol processing in the rest of the program that that routine will go in. ) Has anyone done any *weighted* versions ( that weight "natural" mispelling replacements like (S|Z) less, or even better, that handle some pairs as one 'symbol' ? Anyone have any experience with different (more 'symbolic') algorithms ? I know U.Arizona has an 'approximate' grep program. I ftp-ed a copy, but I can't ( at the moment ) read the PostScript document. Can anyone give me a quick description of the algorithm they use ? Note: I have Hall&Dowling:Approximate String Matching and Sankoff&Kruskal: Time Warps, String edits and Macromolecules: the theory and practice of sequence comparison. ( I haven't read it all yet, but I've sure got it!) So I'm not looking for theory as much as practice. War stories invited! -Steve Majewski -- ========= "If you've got a hammer, find a nail!" - George Bush ========= Steven D. Majewski University of Virginia Physiology Dept. sdm7g@Virginia.EDU Box 449 Health Sciences Center (804)-982-0831 Charlottesville, VA 22908 From icon-group-request@arizona.edu Fri Nov 1 08:37:24 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 1 Nov 91 08:37:24 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA28916; Fri, 1 Nov 91 08:37:23 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Fri, 1 Nov 1991 08:36 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA01162; Fri, 1 Nov 91 07:17:10 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Fri, 1 Nov 1991 08:37 MST Date: 29 Oct 91 07:06:10 GMT From: usenet.coe.montana.edu!caen!uvaarpa!murdoch!usenet@decwrl.dec.com (Steven D. Majewski) Subject: Icon futures ? (coroutines,patterns,etc.) Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: Message-Id: <1991Oct29.070610.12954@murdoch.acc.Virginia.EDU> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Virginia I have been reading Ken Walker's paper in Computer Language (1989) First Class Patterns for Icon. I also saw poking around the archives a statement that there was no plan for incorporating this feature ( I think they were referred to as "C-expressions" ? ) Is this still an acurate statement ? Are there any other similar developments ? [ This paper described a change to the semantics of Icon co-expressions to allow patterns to be defined as unevaluated co-expressions. ] I am just beginning to learn Icon and I've only READ a few dozen lines of SNOBOL code, but I must agree that although I find the Icon way of doing things flexible and powerful, some of the pattern matching code is not as clear or obvious to the eye as, say, a BISON|YACC grammar. [ Anyone written the ICON equivalent? A table driven Icon code generator?] Also- I saw the idea of grep-type regexp routines discussed. Sorry if all this has been hashed over and settled, but although Icon mathing expressions are very powerful, some things like regular expressions seem to be a special case in terms of the possible effeciency. [ There is a paper by Ken-Chih Liu that proposes (for modified PASCAL) a pattern declaration and a "REGULAR" pattern declaration, so that string matching functions can make use of regular expression matching when possible.] Rather than the typical Unix/grep type regexp patterns, I would suggest a list of character-strings, csets or list of character strings . ( I'm not sure about the best way of expressing multiple occurances.) Icon is a typed language without type declarations and with some limited automatic coercion of types. ( I hope that is an acurate statement. ) What would be the ramifications of changing the language to a hierarchy of types ( a la russel, or like OO class inheritance perhaps ) Thus csets would become, not another type, but a special case of character strings. Regular expressions would be a sequence of strings, csets and sets of strings... I guess that this is no longer Icon, but anyone have a good idea of how to put it together consistently ? - Steve Majewski -- ========= "If you've got a hammer, find a nail!" - George Bush ========= Steven D. Majewski University of Virginia Physiology Dept. sdm7g@Virginia.EDU Box 449 Health Sciences Center (804)-982-0831 Charlottesville, VA 22908 From cjeffery Fri Nov 1 10:50:02 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 1 Nov 91 10:50:02 MST Resent-From: "Clinton Jeffery" Received: from Arizona.edu (Hopey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA04827; Fri, 1 Nov 91 10:49:59 MST Received: from optima.cs.arizona.edu by Arizona.edu with PMDF#10282; Fri, 1 Nov 1991 10:49 MST Received: from cheltenham.cs.arizona.edu by optima.cs.arizona.edu (4.1/15) id AA04803; Fri, 1 Nov 91 10:48:58 MST Received: by cheltenham.cs.arizona.edu; Fri, 1 Nov 91 10:48:57 MST Resent-Date: Fri, 1 Nov 1991 10:49 MST Date: Fri, 1 Nov 91 10:48:57 MST From: "Clinton L. Jeffery" Subject: RE: Icon futures ? (coroutines,patterns,etc.) Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: Message-Id: <9111011748.AA29800@cheltenham.cs.arizona.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu I have enjoyed Richard Goerwitz and Steve Majewski's recent posts to the group, and I'd like to add my 2 cents worth. First, all these proposals seem to be about adding something to the language, rather than just adding Icon program library procedure(s), and the compelling reason (for those without the Icon compiler) is speed. Second, the "right" thing to add to the language is not clear at all. "findre" and "matchre" aren't general enough for many situations I wish I had regular expressions. Richard was asking for examples. I want to be able to match a regular expression while selecting some pieces of it that I need, and discarding others. In order to process an Icon declaration, for instance, I only want the second, fifth, and sixth components of the following regular expression: [ \t]* (procedure|record) [\ t]* "(" {[a-zA-Z0-9] (,[a-zA-Z0-9]*)*} ")" I won't belabor this point, but proposals for official additions to Icon undergo a lot more scrutiny than proposals for Icon library procedures. (If my regular expression notation looks weird, that just shows that regular expression notations are not all equal). Third, anyone who wants to add RE's as Icon builtins might look for a suitable set of public domain C functions on which to build. I believe several free versions exist, such as one by Henry Spencer of the U. of Toronto, but I don't know which are public domain and which are "copylefted" and such. But the point is that the best way to build grass-roots support for an extension is to produce an implementation that works on *lots* of (and potentially all) versions of Icon, not just certain UNIX'es. Now I'd like to continue my ranting by moving onto Steve Majewski's post. There are no plans to add patterns or regular expressions to Icon that I know of within the Icon Project. I agree with you though that string scanning is an incredibly verbose (and therefore cumbersome) way of describing lots of string operations that are done easily in other languages. You compare them to YACC -- which is more than a little unfair since Icon is a "General Purpose" language and YACC is not. If we add patterns or regular expressions to Icon, will they fit into Icon, or will they be a separate sublanguage? On the other hand, I think someone has in fact written a YACC-type program for Icon, one that generates Icon code. SM> Icon is a typed language without type declarations and with some limited SM> automatic coercion of types. ( I hope that is an acurate statement. ) SM> What would be the ramifications of changing the language to a hierarchy SM> of types ( a la russel, or like OO class inheritance perhaps ) SM> Thus csets would become, not another type, but a special case of SM> character strings. Regular expressions would be a sequence of strings, SM> csets and sets of strings... I guess that this is no longer Icon, SM> but anyone have a good idea of how to put it together consistently ? This sounds like an excellent area of future research. I may well choose to pursue this within the context of Idol, an object-oriented version of Icon. Csets, though, should not be a special case of strings! And where regular expressions fit is less clear than you'd like them to be. You are right that this is no longer Icon you envision. This is a successor to Icon. Clint Jeffery From icon-group-request@arizona.edu Fri Nov 1 13:18:50 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 1 Nov 91 13:18:50 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Osprey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA11441; Fri, 1 Nov 91 13:18:48 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Fri, 1 Nov 1991 13:17 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA07527; Fri, 1 Nov 91 10:23:56 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Fri, 1 Nov 1991 13:18 MST Date: 29 Oct 91 04:32:23 GMT From: cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!midway!quads!goer@ucbvax.berkeley.edu (Richard L. Goerwitz) Subject: What is Icon? Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: Message-Id: <1991Oct29.043223.9307@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago I receive fairly frequent letters from people who want to know what Icon is. Often I have some blurb around, but typically it's been writ- ten by someone else, and reflects a theorist or language-designer's vantage point. Usually, I advise the person to read comp.lang.icon a while, but I guess I don't really see how that would help, given the nature of most postings here. Just for the sake of people who I've sent here, and for the sake of people who are reading along trying to find out what Icon is, here's a longer blurb that reflects a humanist's perspective on the language: ---- Icon (1976) represents a combination of Prolog-like evaluation mechanisms with an Algol-based syntax and SNOBOL-derived string processing facilities. Icon offers automatic storage allocation and garbage collection, as well as built in associative arrays, lists, "real" strings (i.e. not just char arrays), and a data type resembling mathematical sets. Icon is a strongly, though not statically, typed language offering transparent automatic type conversions (i.e. 10, depending on its context, may be converted to real, string, etc.) and an elegant string processing mechanism known as "scanning." Central to Icon is the concept of the generator, i.e. the inherent capacity on the part of expressions to produce multiple results. Central also is the notion of goal-directed evaluation - a form of backtracking in which the components of an expression are resumed until some result is achieved, or else the expression as a whole fails. Icon was originally designed by Ralph Griswold, Dave Hanson, and Tim Korb. It was first implemented in C by Steve Wampler. Definitive references: Ralph E. and Madge T. Griswold, _The Icon Programming Language_ (2nd ed.; Prentice Hall, 1989); _The Implementation of the Icon Programming Language_ (Princeton Univ. Pr., 1986). Icon is at its best when used as a prototyping tool, for processing text, for performing various mappings and conversions, and as a general tool for solving problems that tend to require heuristic mechanisms, rather than purely algorithmic ones. In general, Icon's design assigns a higher priority to consistency and lucidity than to functionality within one or another operating environment. For this reason, it is not a good UNIX system administration tool. Nor is it particularly fast. It is a clean, portable system implemented under VMS, MVS, SYSV, Mach, BSD, Ultrix, HP/UX, AEGIS, OS/2, and many other operating systems, as well as for various micros, such as the Atari, Amiga, and PC. Icon is a good language choice for theorists exploring language design, for scholars in the humanities, and generally for people interested in nonnumeric computing. Ongoing interesting work being done in Icon include an object- oriented extension to the language, known as IDOL. A compiler is also being developed (a preliminary version of which is on cs.arizona.edu). Interfaces are being created for X, and also (commercially) for curses. There is a commercial version for the Mac available from Catspaw (known primarily for its PC SNOBOL). Examples of Icon code can be obtained from many sources. The primary source is certainly the Icon Program Library, which can be ob- tained from cs.arizona.edu (icon/library). There is also a publication put out by the Icon Project (icon-project@cs.arizona.edu) called the _Analyst_. Naturally the two definitive works cited above contain a fair amount of code. There are also several programs available on the internet that are written in Icon (e.g. "mtf", "jargon," "quranref," all in various issues of comp.sources.misc). At times, large chunks of code get posted to comp.lang.icon, but these tend to come in fits and starts. A final place to look for Icon code is in the icon/con- trib directory on cs.arizona.edu. ----- I really hope that this posting answers in a systematic way many of the questions newcomers to Icon have on their minds. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request@arizona.edu Fri Nov 1 13:19:40 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 1 Nov 91 13:19:40 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Osprey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA11458; Fri, 1 Nov 91 13:19:37 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Fri, 1 Nov 1991 13:18 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA07527; Fri, 1 Nov 91 10:23:56 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Fri, 1 Nov 1991 13:19 MST Date: 29 Oct 91 04:32:23 GMT From: cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!midway!quads!goer@ucbvax.berkeley.edu (Richard L. Goerwitz) Subject: What is Icon? Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: Message-Id: <1991Oct29.043223.9307@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago From alex@laguna.metaphor.com Fri Nov 1 18:03:34 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 1 Nov 91 18:03:34 MST Received: from relay.metaphor.com by optima.cs.arizona.edu (4.1/15) id AA26844; Fri, 1 Nov 91 18:03:26 MST Received: from laguna.Metaphor.COM by relay.metaphor.com (4.1/SMI-4.1) id AA09270; Fri, 1 Nov 91 17:02:20 PST Received: by laguna.Metaphor.COM (4.1/SMI-4.0) id AA17910; Fri, 1 Nov 91 17:03:21 PST Date: Fri, 1 Nov 91 17:03:21 PST From: alex@laguna.metaphor.com (Bob Alexander) Message-Id: <9111020103.AA17910@laguna.Metaphor.COM> To: icon-group@cs.arizona.edu Subject: Re: regular expression in Icon Well, since there seems to be so much interest in regular expressions recently in icon-group, I have an offering that addresses the two predominant objections I've been hearing: 1) Richard G's concern about speed -- the routines that I've attached to this message run at least 10 times faster than the findre() suite. Smaller too. There is an interesting technique used to exploit Icon's built-in backtracking so that it does not have to be done explicitly in the Icon code. 2) Clint's concern about full-featuredness -- these routines support the perl RE style, which is all of egrep's features plus a few more, including access to sub-expressions matched. There is *lots* of documentation in the code. I haven't gotten around to submitting these to the library yet, but plan to eventually -- hopefully before the next update. In the meantime, I'd be interested in feedback from any of you Icon enthusiasts interested enough to give them a try. I also attached an "igrep" program so you can use it without having to write your own test program. -- Bob Alexander Metaphor Computer Systems (415) 961-3600 x751 alex@metaphor.com ====^=== Mountain View, CA ...{uunet}!{decwrl,apple}!metaphor!alex -------------------------- regexp.icn ---------------------------- ############################################################################ # # Name: regexp.icn # # Title: UNIX-like Regular Expression Pattern Matching Procedures # # Author: Robert J. Alexander # # Date: April 24, 1991 # ############################################################################ # # This is a kit of procedures to deal with UNIX-like regular expression # patterns. # # These procedures are interesting partly because of the "recursive # suspension" (or "suspensive recursion" :-) technique used to simulate # conjunction of an arbitrary number of computed expressions (see # notes, below). # # # The public procedures are: # # ReMatch(pattern,s,i1,i2) : i3,i4,...,iN # ReFind(pattern,s,i1,i2) : i3,i4,...,iN # RePat(s) : pattern list # # # ReMatch() produces the sequence of positions in "s" past a substring # starting at "i1" that matches "pattern", but fails if there is no # such position. Similar to match(), but is capable of generating # multiple positions. # # ReFind() produces the sequence of positions in "s" where substrings # begin that match "pattern", but fails if there is no such position. # Similar to find(). Each position is produced only once, even if # several possible matches are possible at that position. # # "pattern" can be either a string or a pattern list -- see RePat(), # below. # # Default values of s, i1, and i2 are handled as for Icon's built-in # string scanning procedures such as match(). # # # RePat(s) : L # # Creates a pattern element list from pattern string "s", but fails if # the pattern string is not syntactically correct. ReMatch() and # ReFind() will automatically convert a pattern string to a pattern # list, but it is faster to do the conversion explicitly if multiple # operations are done using the same pattern. An additional advantage # to compiling the pattern separately is avoiding ambiguity of failure # caused by an incorrect pattern and failure to match a correct pattern. # # # Accessible Global Variables # # After a match, the strings matched by parenthesized regular # expressions are left in list "Re_ParenGroups", and can be accessed by # subscripting in using the same number as the \N construct. # # If it is desired that regular expression format be similar to UNIX # filename generation patterns but still retain the power of full # regular expressions, make the following assignments prior to # compiling the pattern string: # # Re_ArbString := "*" # Defaults to ".*" # Re_AnyString := "?" # Defaults to "." # # The sets of characters (csets) that define a word, digits, and white # space can be modified. The following assignments can be made before # compiling the pattern string. The character sets are captured when # the pattern is compiled, so changing them after pattern compilation # will not alter the behavior of matches unless the pattern string is # recompiled. # # Re_WordChars := 'whatever you like' # # Defaults to &letters ++ &digits ++ "_" # Re_Digits := &digits ++ 'ABCDEFabcdef' # # Defaults to &digits # Re_Space := 'whatever you like' # # Defaults to ' \t\v\n\r\f' # # These globals are normally not initialized until the first call to # RePat(), and then only if they are null. They can be explicitly # initialized to their defaults (if they are null) by calling # Re_Default(). # # Characters compiled into patterns can be passed through a # user-supplied filter procedure, provided in global variable # Re_Filter. The filtering is done before the characters are bound # into the pattern. The filter proc is passed one argument, the string # to filter, and it must return the filtered string as its result. If # the filter proc fails, the string will be used unfiltered. The # filter proc is called with an argument of either type string (for # characters in the pattern) or cset (for character classes [...]). A # typical use for this facility is to implement case-independent # matching. All pattern characters can downshifted by assigning # # Re_Filter := map # # Filtering is done only as the pattern is compiled. Filtering of # strings to be matched must be explicitly done. Therefore, # case-independent matching will occur only if map() is applied to all # strings to be matched. # # In the case of patterns containing alternation, ReFind() will # generally not produce positions in increasing order, but will produce # all positions from the first term of the alternation (in increasing # order) followed by all positions from the second (in increasing # order). If it is necessary that the positions be generated in # strictly increasing order, with no duplicates, assign any non-null # value to Re_Ordered: # # Re_Ordered := 1 # # If the Re_Ordered options is chosen, there is a *small* penalty in # efficiency in some cases, and the co-expression facility is required # in your Icon implementation. Example: # # # Regular Expression Characters and Features Supported # # The regular expression format supported by procedures in this file # model very closely those supported by the UNIX "egrep" program, with # modifications as in the Perl programming language definition. # Following is a brief description of the special characters used in # regular expressions. In the description, the abbreviation RE means # regular expression. # # c An ordinary character (not one of the special characters # discussed below) is a one-character RE that matches that # character. # # \c A backslash followed by any special character is a one- # character RE that matches the special character itself. # # Note that backslash escape sequences representing # non-graphic characters are not supported directly # by these procedures. Of course, strings coded in an # Icon program will have such escapes handled by the # Icon translator. If such escapes must be supported # in strings read from the run-time environment (e.g. # files), they will have to be converted by other means, # such as the Icon Program Library procedure "escape()". # # . A period is a one-character RE that # matches any # character. # # [string] A non-empty string enclosed in square brackets is a one- # character RE that matches any *one* character of that # string. If, the first character is "^" (circumflex), # the RE matches any character not in the remaining # characters of the string. The "-" (minus), when between # two other characters, may be used to indicate a range of # consecutive ASCII characters (e.g. [0-9] is equivalent to # [0123456789]). Other special characters stand for # themselves in a bracketed string. # # * Matches zero or more occurrences of the RE to its left. # # + Matches one or more occurrences of the RE to its left. # # ? Matches zero or one occurrences of the RE to its left. # # {N} Matches exactly N occurrences of the RE to its left. # # {N,} Matches at least N occurrences of the RE to its left. # # {N,M} Matches at least N occurrences but at most M occurrences # of the RE to its left. # # ^ A caret at the beginning of an entire RE constrains # that RE to match an initial substring of the subject # string. # # $ A currency symbol at the end of an entire RE constrains # that RE to match a final substring of the subject string. # # | Alternation: two REs separated by "|" match either a # match for the first or a match for the second. # # () A RE enclosed in parentheses matches a match for the # regular expression (parenthesized groups are used # for grouping, and for accessing the matched string # subsequently in the match using the \N expression). # # \N Where N is a digit in the range 1-9, matches the same # string of characters as was matched by a parenthesized # RE to the left in the same RE. The sub-expression # specified is that beginning with the Nth occurrence # of "(" counting from the left. E.g., ^(.*)\1$ matches # a string consisting of two consecutive occurrences of # the same string. # # Perl Extensions # # The following extensions to UNIX REs, as specified in the Perl # programming language, are supported. # # \w Matches any alphanumeric (including "_"). # \W Matches any non-alphanumeric. # # \b Matches only at a word-boundary (word defined as a string # of alphanumerics as in \w). # \B Matches only non-word-boundaries. # # \s Matches any white-space character. # \S Matches any non-white-space character. # # \d Matches any digit [0-9]. # \D Matches any non-digit. # # \w, \W, \s, \S, \d, \D can be used within [string] REs. # # # Note on Details of Matching # # The method of matching differs a bit from UNIX-style regular # expressions -- particularly where closures are concerned ("*", "+", # "{}", "?"). UNIX regular expressions are documented to match the # "longest, leftmost" strings in cases where a choice is needed. The # procedures in this file are capable of generating all possible # matches of the pattern, and generate the possibilities by matching # the fewest first ("shortest, leftmost"). Matching of the various # pattern elements is performed exactly as though it were an Icon # conjunction of the elements. # # # Notes on computed conjunction expressions by "suspensive recursion" # # A conjunction expression of an arbitrary number of terms can be # computed in a looping fashion by the following recursive technique: # # procedure Conjunct(v) # if then # suspend Conjunct() # else # suspend v # end # # The argument "v" is needed for producing the value of the last term # as the value of the conjunction expression, accurately modeling Icon # conjunction. If the value of the conjunction is not needed, the # technique can be slightly simplified by eliminating "v": # # procedure ConjunctAndProduceNull() # if then # suspend ConjunctAndProduceNull() # else # suspend # end # # Note that must still remain in the suspend # expression to test for failure of the term, although its value is not # passed to the recursive invocation, This could have been coded as # # suspend & ConjunctAndProduceNull() # # but wouldn't have been as provocative. # # Since the computed conjunctions in this program are evaluated only for # their side effects, the second technique is used in two situations: # # (1) To compute the conjunction of all of the elements in the # regular expression pattern list (Re_match1()). # # (2) To evaluate the "exactly N times" and "N to M times" # control operations (Re_NTimes()). # record Re_Tok(proc,args) global Re_ParenGroups,Re_Filter,Re_Ordered global Re_WordChars,Re_NonWordChars global Re_Space,Re_NonSpace global Re_Digits,Re_NonDigits global Re_ArbString,Re_AnyString global Re_TabMatch ################### Pattern Translation Procedures ################### procedure RePat(s) # L # # Produce pattern list representing pattern string s. # # # Create a list of pattern elements. Pattern strings are parsed # and converted into list elements as shown in the following table. # Since some list elements reference other pattern lists, the # structure is really a tree. # # Token Generates Matches... # ----- --------- ---------- # ^ Re_Tok(pos,[1]) Start of string or line # $ Re_Tok(pos,[0]) End of string or line # . Re_Tok(move,[1]) Any single character # + Re_Tok(Re_OneOrMore,[tok]) At least one occurrence of # previous token # * Re_Tok(Re_ArbNo,[tok]) Zero or more occurrences of # previous token # | Re_Tok(Re_Alt,[pattern,pattern]) Either of prior expression # or next expression # [...] Re_Tok(Re_TabAny,[cset]) Any single character in # specified set (see below) # (...) Re_Tok(Re_MatchReg,[pattern]) Parenthesized pattern as # single token # The string of no-special # Re_Tok(Re+TabMatch,string) characters # \b Re_Tok(Re_WordBoundary,[Re_WordChars,Re_NonWordChars]) # A word-boundary # (word default: [A-Za-z0-9_]+) # \B Re_Tok(Re_NonWordBoundary,[Re_WordChars,Re_NonWordChars]) # A non-word-boundary # \w Re_Tok(Re_TabAny,[Re_WordChars])A word-character # \W Re_Tok(Re_TabAny,[Re_NonWordChars]) A non-word-character # \s Re_Tok(Re_TabAny,[Re_Space]) A space-character # \S Re_Tok(Re_TabAny,[Re_NonSpace]) A non-space-character # \d Re_Tok(Re_TabAny,[Re_Digits]) A digit # \D Re_Tok(Re_TabAny,[Re_NonDigits]) A non-digit # {n,m} Re_Tok(Re_NToMTimes,[tok,n,m]) n to m occurrences of # previous token # {n,} Re_Tok(Re_NOrMoreTimes,[tok,n]) n or more occurrences of # previous token # {n} Re_Tok(Re_NTimes,[tok,n]) exactly n occurrences of # previous token # ? Re_Tok(Re_ZeroOrOneTimes,[tok]) one or zero occurrences of # previous token # \ Re_Tok(Re_MatchParenGroup,[n]) The string matched by # parenthesis group # local plist,starter,ender # # Initialize. # initial Re_Default() Re_WordChars := cset(Re_WordChars) Re_NonWordChars := ~Re_WordChars Re_Space := cset(Re_Space) Re_NonSpace := ~Re_Space Re_Digits := cset(Re_Digits) Re_NonDigits := ~Re_Digits # # Deal with ^ and $, which can only appear at the beginning and end, # respectively, of a whole RE. # if s[1] == "^" then { starter := Re_Tok(pos,[1]) s[1] := "" } if s[-1] == "$" then { ender := Re_Tok(pos,[0]) s[-1] := "" } s ? (plist := Re_pat1(0)) | fail push(plist,\starter) put(plist,\ender) return plist end procedure Re_pat1(level) # L # # Recursive portion of RePat() # local plist,n,m,x,comma static none,parenNbr initial { Re_TabMatch := proc("=",1) none := [] } if level = 0 then parenNbr := 0 plist := [] # # Loop to put pattern elements on list. # until pos(0) do { (="|",plist := [Re_Tok(Re_Alt,[plist,Re_pat1(level + 1) | fail])]) | put(plist, (match(")"),level > 0,break) | (=Re_ArbString,Re_Tok(Re_Arb,none)) | (=Re_AnyString,Re_Tok(move,[1])) | (="+",Re_Tok(Re_OneOrMore,[pull(plist) | fail])) | (="*",Re_Tok(Re_ArbNo,[pull(plist) | fail])) | Re_Tok(Re_TabAny,[Re_cset()]) | 3(="(",n := parenNbr +:= 1, Re_Tok(Re_MatchReg,[Re_pat1(level + 1) | fail,n]), move(1) | fail) | (="\\b",Re_Tok(Re_WordBoundary,[Re_WordChars,Re_NonWordChars])) | (="\\B",Re_Tok(Re_NonWordBoundary,[Re_WordChars,Re_NonWordChars])) | (="\\w",Re_Tok(Re_TabAny,[Re_WordChars])) | (="\\W",Re_Tok(Re_TabAny,[Re_NonWordChars])) | (="\\s",Re_Tok(Re_TabAny,[Re_Space])) | (="\\S",Re_Tok(Re_TabAny,[Re_NonSpace])) | (="\\d",Re_Tok(Re_TabAny,[Re_Digits])) | (="\\D",Re_Tok(Re_TabAny,[Re_NonDigits])) | (="{",(n := tab(many(&digits)),comma := =(",") | &null, m := tab(many(&digits)) | &null,="}") | fail, if \m then Re_Tok(Re_NToMTimes, [pull(plist),integer(n),integer(m)]) else if \comma then Re_Tok(Re_NOrMoreTimes, [pull(plist),integer(n)]) else Re_Tok(Re_NTimes,[pull(plist),integer(n)])) | (="?",Re_Tok(Re_ZeroOrOneTimes,[pull(plist) | fail])) | Re_Tok(Re_TabMatch,[Re_string(level)]) | (="\\",n := tab(any(&digits)),Re_Tok(Re_MatchParenGroup,[integer(n)])) ) | fail } return plist end procedure Re_Default() # # Assign default values to regular expression translation globals, but # only to variables whose values are null. # /Re_WordChars := &letters ++ &digits ++ "_" /Re_Space := ' \t\v\n\r\f' /Re_Digits := &digits /Re_ArbString := ".*" /Re_AnyString := "." return end procedure Re_cset() # # Matches a [...] construct and returns a cset. # local complement,c,e,ch,chars (="[",complement := ="^" | &null, c := (ch := (="-" | "")) || move(1) || tab(find("]")),move(1)) | fail c ? { e := ch while chars := tab(upto('-\\')) do { e ++:= case move(1) of { "-": chars[1:-1] ++ &cset[ord(chars[-1]) + 1:ord(move(1)) + 2] | "-" "\\": case ch := move(1) of { "w": Re_WordChars "W": Re_NonWordChars "s": Re_Space "S": Re_NonSpace "d": Re_Digits "D": Re_NonDigits default: ch } } } e ++:= tab(0) if \complement then e := ~e } e := (\Re_Filter)(e) return cset(e) end procedure Re_string(level) # # Matches a string of non-special characters, returning a string. # local special,s,p,c static nondigits initial nondigits := ~&digits special := if level = 0 then '\\.+*|[({?' else '\\.+*|[({?)' s := tab(upto(special) | 0) while ="\\" do { p := &pos if tab(any('wWbBsSdD')) | (tab(any('123456789')) & (pos(0) | any(nondigits))) then { tab(p - 1) break } s ||:= c || tab(upto(special)) } s := string((\Re_Filter)(s)) return "" ~== s end ##################### Matching Engine Procedures ######################## procedure ReMatch(plist,s,i1,i2) # i3,i4,...,iN # # Produce the sequence of positions in s past a string starting at i1 # that matches the pattern plist, but fails if there is no such # position. Similar to match(), but is capable of generating multiple # positions. # if type(plist) ~== "list" then plist := RePat(plist) | fail /i1:= if /s := &subject then &pos else 1 ; /i2 := 0 Re_ParenGroups := [] suspend s[i1:i2] ? (Re_match1(plist,1),i1 + &pos - 1) end procedure Re_match1(plist,i) # s1,s2,...,sN # # Used privately by ReMatch() to simulate a computed conjunction # expression via recursive generation. # local tok suspend if tok := plist[i] then Re_tok_match(tok,plist,i) & Re_match1(plist,i + 1) else &null end procedure ReFind(plist,s,i1,i2) # i3,i4,...,iN # # Produce the sequence of positions in s where strings begin that match # the pattern plist, but fails if there is no such position. Similar # to find(). # local p if type(plist) ~== "list" then plist := RePat(plist) | fail /i1 := if /s := &subject then &pos else 1 ; /i2 := 0 s[i1:i2] ? suspend ( tab(Re_skip(plist,1)) & p := &pos & Re_match1(plist,1)\1 & i1 + p - 1) end procedure Re_tok_match(tok,plist,i) # # Match a single token. Can be recursively called by the token # procedure. # local prc prc := tok.proc suspend ( if prc === Re_Arb then Re_Arb(plist,i) else suspend prc!tok.args ) end ########## Heuristic Code for Matching Arbitrary Characters ########## procedure Re_skip(plist,i) # s1,s2,...,sN # # Used privately -- match a sequence of strings in s past which a match # of the first pattern element in plist is likely to succeed. This # procedure is used for heuristic performance improvement by ReMatch() # for the ".*" pattern element, and by ReFind(). # local x,s,p,prc x := plist[i] suspend case prc := (\x).proc | &null of { Re_TabMatch: find!x.args Re_TabAny: upto!x.args pos: x.args[1] ## Re_WordBoundary: Re_WordBoundaries!x.args Re_WordBoundary | Re_NonWordBoundary: p := &pos & tab(Re_skip(plist,i + 1)) & prc!x.args & untab(p) Re_OneOrMore | Re_MatchParenGroup: if s := (\Re_ParenGroups)[x.args[1]] then find(s) else &pos to *&subject + 1 Re_NToMTimes | Re_NOrMoreTimes | Re_NTimes: if x.args[2] > 0 then Re_skip(x.args[1],1) else &pos to &subject + 1 Re_MatchReg: Re_skip(x.args[1],1) Re_Alt: if \Re_Ordered then Re_result_merge{Re_skip(x.args[1],1),Re_skip(x.args[2],1)} else Re_skip(x.args[1 | 2],1) default: &pos to *&subject + 1 } end procedure Re_result_merge(L) # # Programmer-defined control operation to merge the result sequences of # two integer-producing generators. Both generators must produce their # result sequences in numerically increasing order with no duplicates, # and the output sequence will be in increasing order with no # duplicates. # local e1,e2,r1,r2 e1 := L[1] ; e2 := L[2] r1 := @e1 ; r2 := @e2 while \(r1 | r2) do if /r2 | \r1 < r2 then suspend r1 do r1 := @e1 | &null else if /r1 | r1 > r2 then suspend r2 do r2 := @e2 | &null else r2 := @e2 | &null end procedure untab(origPos) # # Converts a string scanning expression that moves the cursor to one # that produces a cursor position and doesn't move the cursor (converts # something like tab(find(x)) to find(x). The template for using this # procedure is # # origPos := &pos ; tab(x) & ... & untab(origPos) # local newPos newPos := &pos tab(origPos) suspend newPos tab(newPos) end ## procedure Re_WordBoundaries(wd) ## # ## # Produce positions that are word boundaries. ## # ## local p,q ## p1 := p := &pos ## while q := upto(wd,,p) | break do { ## suspend q ## p := many(wd,,q) ## suspend p ## } ## tab(p1) ## end ####################### Matching Procedures ####################### procedure Re_Arb(plist,i) # # Match arbitrary characters (.*) # suspend tab(if \plist then Re_skip(plist,i + 1) else 1 to *&subject + 1) end procedure Re_TabAny(C) # # Match a character of a character set ([...],\w,\W,\s,\S,\d,\D #) suspend tab(any(C)) end procedure Re_MatchReg(tokList,groupNbr) # # Match parenthesized group and assign matched string to list Re_ParenGroup # local p,s p := &pos every Re_match1(tokList,1) do { /Re_ParenGroups := [] while *Re_ParenGroups < groupNbr do put(Re_ParenGroups) s := &subject[p:&pos] Re_ParenGroups[groupNbr] := s suspend s } Re_ParenGroups[groupNbr] := &null end procedure Re_WordBoundary(wd,nonwd) # # Match word-boundary (\b) # suspend ((pos(1),any(wd)) | (pos(0),move(-1),tab(any(wd))) | (move(-1), (tab(any(wd)),any(nonwd)) | (tab(any(nonwd)),any(wd))),"") end procedure Re_NonWordBoundary(wd,nonwd) # # Match non-word-boundary (\B) # suspend ((pos(1),any(nonwd)) | (pos(0),move(-1),tab(any(nonwd))) | (move(-1), (tab(any(wd)),any(wd)) | (tab(any(nonwd)),any(nonwd)),"")) end procedure Re_MatchParenGroup(n) # # Match same string matched by previous parenthesized group (\N) # suspend if s := \Re_ParenGroups[n] then =s else "" end ################### Control Operation Procedures ################### procedure Re_ArbNo(tok) # # Match any number of times (*) # suspend "" | (Re_tok_match(tok) & Re_ArbNo(tok)) end procedure Re_OneOrMore(tok) # # Match one or more times (+) # suspend Re_tok_match(tok) & Re_ArbNo(tok) end procedure Re_NToMTimes(tok,n,m) # # Match n to m times ({n,m} # suspend Re_NTimes(tok,n) & Re_ArbNo(tok)\(m - n + 1) end procedure Re_NOrMoreTimes(tok,n) # # Match n or more times ({n,}) # suspend Re_NTimes(tok,n) & Re_ArbNo(tok) end procedure Re_NTimes(tok,n) # # Match exactly n times ({n}) # if n > 0 then suspend Re_tok_match(tok) & Re_NTimes(tok,n - 1) else suspend end procedure Re_ZeroOrOneTimes(tok) # # Match zero or one times (?) # suspend "" | Re_tok_match(tok) end procedure Re_Alt(tokList1,tokList2) # # Alternation (|) # suspend Re_match1(tokList1 | tokList2,1) end -------------------------- igrep.icn ---------------------------- # # Program to emulate UNIX egrep, but using the enhanced regular # expressions supported by regexp.icn. Options supported are nearly # identical to those supported by egrep (no -b: print disk block # number). There is one additional option, -E, to allow Icon-type # (hence C-type) string escape sequences in the pattern string. # BEWARE: when -E is used, backslashes that are meant to be processed # in the regular expression context must be doubled. The following # patterns are equivalent: # # without -E: '\bFred\b' # with -E: '\\bFred\\b' # procedure Usage(n) write(&errout, "igrep -- emulates UNIX egrep\n_ Usage: igrep -Options [expression] filename..._ \n Options:_ \n c print count of matching lines rather than actual lines_ \n h don't display file names_ \n i ignore case of letters_ \n l list only the names of files containing matching lines_ \n n precede lines with line numbers_ \n s work silently -- display nothing_ \n v invert search to display only lines that don't match_ \n e expr useful if expressions starts with -_ \n E expr expresson containing Icon escape sequences_ \n f file take list of alternated expressions from \"file\"") exit(n) end link options,regexp procedure main(arg) if *arg = 0 then Usage() Options(arg) compiledPattern := GetPattern(arg) | {write(&errout,"Bad pattern ",image(Pattern)) ; exit(2)} exit(ScanFiles(arg,compiledPattern)) end global CountOnly,NoNames,NamesOnly,NumberLines,Out,Invert,Escapes, Pattern,PatternFile procedure Options(arg) opt := options(arg,"chilnsve:E:f:") CountOnly := opt["c"] NoNames := opt["h"] if \opt["i"] then Re_Filter := map NamesOnly := opt["l"] NumberLines := opt["n"] Out := if \opt["s"] then &null else &output Invert := opt["v"] Pattern := \opt["e" | "E"] Escapes := opt["E"] PatternFile := opt["f"] return opt end procedure GetPattern(arg) if \PatternFile then { f := open(PatternFile) | stop("Can't open pattern file \"",PatternFile,"\"") (/Pattern := "" & sep := "") | (sep := "|") while Pattern ||:= sep || read(f) do sep := "|" close(f) } /Pattern := get(arg) if /Pattern then Usage(2) return RePat(if \Escapes then istring(Pattern) else Pattern) end procedure ScanFiles(arg,pattern) local errors totalCount := 0 if *arg = 0 then arg := ["-"] every fn := !arg do { f := if fn == "-" then &input else open(fn) | {write(&errout,"Can't open \"",fn,"\" -- skipped") ; errors := 2 ; next} header := if \NoNames | *arg = 1 then &null else fn || ":" lineNbr := count := 0 while line := read(f) do { lineNbr +:= 1 line := (\Re_Filter)(line) status := ReFind(pattern,line) | &null status := if \Invert then (\status,&null) | 1 if \status then { count +:= 1 if count = 1 & \NamesOnly then {write(\Out,fn) ; next} lineNbrTag := if \NumberLines then lineNbr || ":" else &null if not \(CountOnly | NamesOnly) then write(\Out,header,lineNbrTag,line) } } close(f) if \CountOnly then write(header,count) totalCount +:= count } ## if \CountOnly & *arg > 1 then write(\Out,"** Total ** ",totalCount) return \errors | if totalCount = 0 then 1 else 0 end # # istring() -- Procedure to convert a string containing special escape # constructs, of the same format as Icon source language character # strings, to their true string representation. Value returned is the # string with special constructs converted to their respective # characters. # procedure istring(s) local r,c r := "" s ? { while r ||:= tab(upto('\\')) do { move(1) r ||:= case c := map(move(1)) of { "b": "\b" # backspace "d": "\d" # delete (rubout) "e": "\e" # escape (altmode) "f": "\f" # formfeed "l": "\l" # linefeed (newline) "n": "\n" # newline (linefeed) "r": "\r" # carriage return "t": "\t" # horizontal tab "v": "\v" # vertical tab "x": istring_radix(16,2)# hexadecimal code "^": char(ord(move(1)) % 32) | break # control code default: { # either octal code or non-escaped character if any('01234567',c) then { # if octal digit move(-1) istring_radix(8,3) } else c # else non-escaped character } | break } } r ||:= tab(0) } return r end procedure istring_radix(r,max) local n,d,i,c d := "0123456789abcdef"[1:r + 1] n := 0 every 1 to max do { c := move(1) | break if not (i := find(map(c),d) - 1) then { move(-1) break } n := n * r + i } return char(n) end From L.Lu@computer-science.birmingham.ac.uk Sat Nov 2 08:34:07 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 2 Nov 91 08:34:07 MST Resent-From: L.Lu@computer-science.birmingham.ac.uk Received: from Arizona.edu (Hopey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA23384; Sat, 2 Nov 91 06:29:51 MST Received: from sun2.nsfnet-relay.ac.uk by Arizona.edu with PMDF#10282; Sat, 2 Nov 1991 06:29 MST Received: from computer-science.birmingham.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <3189-0@sun2.nsfnet-relay.ac.uk>; Sat, 2 Nov 1991 12:29:57 +0000 Received: from christopher by new-percy.cs.bham.ac.uk with SMTP (PP) id <2742-0@new-percy.cs.bham.ac.uk>; Sat, 2 Nov 1991 12:29:58 +0000 Received: from eeyore by christopher-robin.cs.bham.ac.uk (4.1/fileserver/1.2) id AA06638; Sat, 2 Nov 91 12:31:16 GMT Resent-Date: Sat, 2 Nov 1991 06:29 MST Date: Sat, 02 Nov 91 12:31:24 GMT From: L.Lu@computer-science.birmingham.ac.uk Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu, L.Lu@computer-science.birmingham.ac.uk Resent-Message-Id: <65FC4F7D08206D36@Arizona.edu> Message-Id: <6638.9111021231@christopher-robin.cs.bham.ac.uk> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu, L.Lu@computer-science.birmingham.ac.uk signoff Lunjin Lu |---------------------------------------------------------------| | Lunjin Lu |Email: L.Lu@cs.bham.ac.uk | | School of Computer Science |Voice: +44 21-414-3736 | | University of Birmingham |Fax : +44 21-414-4281 | | Edgbaston, Birmingham B15 2TT |Telex: 333762 UOBHAM G | | The United Kingdom | | |---------------------------------------------------------------| From L.Lu@computer-science.birmingham.ac.uk Sat Nov 2 08:34:09 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 2 Nov 91 08:34:09 MST Resent-From: L.Lu@computer-science.birmingham.ac.uk Received: from Arizona.edu (Osprey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA23581; Sat, 2 Nov 91 06:40:50 MST Received: from sun2.nsfnet-relay.ac.uk by Arizona.edu with PMDF#10282; Sat, 2 Nov 1991 06:29 MST Received: from computer-science.birmingham.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <3191-0@sun2.nsfnet-relay.ac.uk>; Sat, 2 Nov 1991 12:30:42 +0000 Received: from christopher by new-percy.cs.bham.ac.uk with SMTP (PP) id <2753-0@new-percy.cs.bham.ac.uk>; Sat, 2 Nov 1991 12:30:44 +0000 Received: from eeyore by christopher-robin.cs.bham.ac.uk (4.1/fileserver/1.2) id AA06641; Sat, 2 Nov 91 12:32:02 GMT Resent-Date: Sat, 2 Nov 1991 06:29 MST Date: Sat, 02 Nov 91 12:32:11 GMT From: L.Lu@computer-science.birmingham.ac.uk Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu, L.Lu@computer-science.birmingham.ac.uk Resent-Message-Id: <6600ED4818805E14@Arizona.edu> Message-Id: <6641.9111021232@christopher-robin.cs.bham.ac.uk> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu, L.Lu@computer-science.birmingham.ac.uk unsubscribe Lunjin Lu |---------------------------------------------------------------| | Lunjin Lu |Email: L.Lu@cs.bham.ac.uk | | School of Computer Science |Voice: +44 21-414-3736 | | University of Birmingham |Fax : +44 21-414-4281 | | Edgbaston, Birmingham B15 2TT |Telex: 333762 UOBHAM G | | The United Kingdom | | |---------------------------------------------------------------| From icon-group-request@arizona.edu Sat Nov 2 10:53:30 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 2 Nov 91 10:53:30 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Hopey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA05893; Sat, 2 Nov 91 10:53:29 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sat, 2 Nov 1991 10:52 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA18134; Sat, 2 Nov 91 09:44:38 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sat, 2 Nov 1991 10:53 MST Date: 29 Oct 91 17:55:05 GMT From: csusac!csus.edu!wupost!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!midway!ellis!goer@ucdavis.ucdavis.edu (Richard L. Goerwitz) Subject: RE: Icon futures ? (coroutines,patterns,etc.) Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <8AD24E29A8206802@Arizona.edu> Message-Id: <1991Oct29.175505.16924@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago References: <1991Oct29.070610.12954@murdoch.acc.Virginia.EDU> sdm7g@galen.med.Virginia.EDU (Steven D. Majewski) writes: >I have been reading Ken Walker's paper in Computer Language (1989) >First Class Patterns for Icon. An interesting article, but probably a bad place to start if you're just beginning Icon :-). >I am just beginning to learn Icon and I've only READ a few dozen lines >of SNOBOL code, but I must agree that although I find the Icon way of >doing things flexible and powerful, some of the pattern matching code >is not as clear or obvious to the eye as, say, a BISON|YACC grammar. > >[ Anyone written the ICON equivalent? A table driven Icon code generator?] This is on my long-term project list. It's a subject that keeps coming up over and over. Basically, what people seem to want is a parser generator that outputs Icon code. In a way, it's not fair to compare YACC with Icon. A more accurate compari- son would be Icon vs. C on the level of string processing facilities. To compare YACC mixes levels. There *is* no compiler compiler for Icon. Al- though Icon makes it unnecessary to use YACC-type tools in many instances, it would still be very, very useful to have a YACC-like tool for it. That's an opinion I fully agree with. >Sorry if all this has been hashed over and settled, but although >Icon mathing expressions are very powerful, some things like regular >expressions seem to be a special case in terms of the possible efficiency... > >Rather than the typical Unix/grep type regexp patterns, I would suggest >a list of character-strings, csets or list of character strings. >(I'm not sure about the best way of expressing multiple occurances.) I'm in favor of splicing in a portable form of regular expressions. What- ever we do, it's going to represent bloat. Right linear grammars are quite trivial to recognize, using Icon's intrinsic facilities, so we'd just be adding regexps as (in your words) "a special case in terms of possible efficiency." Given that they'd be tacked on for efficiency, I'd advocate that they not be altered in any way. Just use the format everyone already knows and loves. To extend Icon to house a new regular expression string or other data type would (I think) be to overfeed the fat man. One problem I have run into in adding regex handlers to the run-time system is that there does not seem to be any way to keep a C structure around as long as a generator needs it to be around, and yet make it visible to the garbage collector. With the SYSV regex routines there's no problem, since they return a char pointer, and this can be stuck into an Icon string (in the underlying implementation, the string descriptor is called a qualifi- er). The GNU Emacs and grep distributions have some nice regexp handlers, but these utilize structures that would need to be housed in a new block data structure in Icon in order for them to interact properly with the system. They also use alloca, which not everyone has, and malloc/realloc/ free, which aren't the normal way of handling Icon storage for anything but coexpressions. Put briefly: For SYSV people, there's a simple solution to the regexp prob- lem. A bit of a kludge. But it works. For others, though, it's not as easy as it looks. What with all the people in the Icon Project have to do already, we'll need to figure out some solution for ourselves. The most we could perhaps do is ask them to include some mechanism for making C objects have the same lifetime and visibility as Icon arguments, so that we can use them in creating new builtin generators. If this isn't trivial (which I'd guess it isn't), then some mechanism for keeping C strings and ints around for the same purpose would be sufficient. Ken Walker (thanks!) mentioned to me that you can just define a function to take extra arguments, and then use the descriptors passed in the extra arguments to house the required ob- jects. This seems to work fine, although it screws up the tracing and error termination displays. >Icon is a typed language without type declarations and with some limited >automatic coercion of types. ( I hope that is an acurate statement. ) Close. Icon variables are capable of being assigned any data type. They are not statically typed. One might even call them untyped. Icon values, though, are typed. In practice, this means you can assign a variable any value of any type any time you want. But the run-time system will always know precisely what type a variable has at any given moment. Many people identify non-static typing with weak typing. Clearly, though, this doesn't describe Icon. (Not to say you said this, but I just want to point it out.) Note that there are no type coercions in Icon, in the sense that I think you mean it. There are value conversions. If, say, I want to write the integer 10 to the standard output, I can say i := 10 write(i) The i is not affected by the write() function, but when it's de-referenced the resulting descriptor is checked to see if it's a string. If it's not, the descriptor is actually converted to a string. It's actually far more flexible than a type cast for some purposes (though more constraining for others, e.g. you have to use ord() and char() to switch from internal ints to 1-char strings). >What would be the ramifications of changing the language to a hierarchy >of types ( a la russel, or like OO class inheritance perhaps ) >Thus csets would become, not another type, but a special case of >character strings. Regular expressions would be a sequence of strings, >csets and sets of strings... I guess that this is no longer Icon, >but anyone have a good idea of how to put it together consistently ? It's funny, but someone I was talking to the other day said *precisely* the same thing. He said if Icon supported such a hierarchy, we could just define a regular expression sub-type, and everyone would think it was a great idea. Icon is pre-OO (1976), and I can't imagine anyone at the Icon Project would relish the though of redesigning the entire system. Clinton Jef- fery has designed an Icon OO preprocessor. But I think that, in order to do what you want, you'd need to design another language like Icon that wasn't Icon. (Or is this just semantics?) Let me add here the usual disclaimers. I'm not a member of the Icon Pro- ject, and I am not in the CS field (if that isn't obvious). The state- ments I make I hope are accurate. But I wouldn't give them any great weight without hearing from people more heavily involved in Icon's in- nards. At best, I hope I've added to what could be an interesting ongoing discussion. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request@arizona.edu Sat Nov 2 12:09:51 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 2 Nov 91 12:09:51 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Hopey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA08030; Sat, 2 Nov 91 12:09:52 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sat, 2 Nov 1991 12:09 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA20415; Sat, 2 Nov 91 10:56:56 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sat, 2 Nov 1991 12:09 MST Date: 1 Nov 91 19:50:34 GMT From: micro-heart-of-gold.mit.edu!wupost!uwm.edu!linac!midway!ellis!goer@bloom-beacon.mit.edu (Richard L. Goerwitz) Subject: RE: Icon futures ? (coroutines,patterns,etc.) Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <957DD7FF3880532A@Arizona.edu> Message-Id: <1991Nov1.195034.18246@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago References: <9111011748.AA29800@cheltenham.cs.arizona.edu> cjeffery@CS.ARIZONA.EDU ("Clinton L. Jeffery") writes: >"Findre" and "matchre" aren't general enough for many situations I wish >I had regular expressions. Richard was asking for examples. I want >to be able to match a regular expression while selecting some pieces of >it that I need, and discarding others. In order to process an Icon >declaration, for instance, I only want the second, fifth, and sixth >components of the following regular expression: > > [ \t]* (procedure|record) [\ t]* "(" {[a-zA-Z0-9] (,[a-zA-Z0-9]*)*} ")" In Icon right now, you can't select series of characters via some sort of macro language. You have to break down what you want into upto() and many() or any() functions. This is the way Icon is set up, and in some cases it may be verbose. If done correctly, it's also very clear and readable, not to mention powerful. My ideas of findre and matchre are pretty much analogous to upto and many. When you tab(upto(cs)) you don't know how far tab(many(cs)) will take you afterwards. So you have to call both functions. Same with findre. You don't know how much farther the pattern will extend after the position re- turned by findre(), so you call matchre to find out, a la: tab(upto(cs)) & tab(many(cs)) tab(findre(regexp)) & tab(matchre(regexp)) It's quite consistent and logical. If you like a terser style, then in this case you really *are* talking about something useful but not Iconish - some- thing that ought to go into the IPL. It wouldn't be that difficult to do, really. A split() function would be simple, and it could set global vari- ables that would correspond to AWK or PERL-like variables. My only question is whether matchre() should have a companion function that is a generator, or if it should be a generator itself, suspending all posi- tions from the end of the string (or j) to i which match the regexp given as its first argument. I'm sure if you think about it you'll see how it is important to be able to backtrack this way. Egrep does it deterministically when you say "a*ab". If you feed it "aaaaab" it won't fail, because it knows when it finds b that it could be in one of two states. To get this intui- tive functionality in Icon, you'd need to make matchre a generator. >Third, anyone who wants to add RE's as Icon builtins might look for a >suitable set of public domain C functions on which to build. I believe >several free versions exist, such as one by Henry Spencer of the U. of >Toronto, but I don't know which are public domain and which are "copylefted" >and such. But the point is that the best way to build grass-roots support >for an extension is to produce an implementation that works on *lots* of >(and potentially all) versions of Icon, not just certain UNIX'es. Spencer's regexp library uses a (possibly terribly slow) NFA. I'm now re- writing the FSF's regex functions for use in Icon, and adding an experimen- tal block type that holds the regexp pattern buffer. I did much of the work last night adding the new block and descriptors, and letting the memory management system know all that it needs to know about them. I now need to write a regfind and regmatch function (seem like good names), and probably also an alc-- routine for the new block, and do housekeeping like alter fdefs.h, the function prototypes, and the makefiles. Looks like outimage is the only internal function I'll need to fool with, since the new block type is not visible to the user. Please let me know if I've forgotten anything. This will take me a week or so, because I'm working very hard on a dissertation in Near Eastern Lang- uages, and I don't work on this sort of thing until the evenings for the most part. Just for a ruse I did try writing a builtin called refind() and one called rematch(). These are already in my version of Icon. They work under SYSV, and exploit the fact that the SYSV regex routines return a character pointer that can be stuffed into a regular Icon qualifier. Cute, but not very gen- eral. Ken Walker has a copy (which has a bug in it, Ken - I add 1 to the length in qtosij(), which isn't right). I don't plan on using these rou- tines, because they aren't portable. I definitely think this whole thing is workable, and whether or not anyone else goes along with me, I'm at least taking advantage of the personal in- terpreter facilities Icon offers :-). One more note: >On the other hand, I >think someone has in fact written a YACC-type program for Icon, one >that generates Icon code. This YACC is a very limited exercize, and the action fields you can spe- cify are heavily circumscribed in what they contain. As I recall, this YACC does not generate an LR(k) parser, but rather a recursive descent backtracking parser, and so loops on left-recursive grammars. Don't quote me on this. It's just what I recall. In order to become a production tool, we need to give Icon a flexible parser generator. But let's not gripe to the Icon Project :-). Icon is PD, and they are already overworked. If anyone has any ideas, let's hear them! -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From cjeffery Sat Nov 2 12:52:47 1991 Date: Sat, 2 Nov 91 12:52:47 MST From: "Clinton L. Jeffery" Message-Id: <9111021952.AA09016@cheltenham.cs.arizona.edu> Received: by cheltenham.cs.arizona.edu; Sat, 2 Nov 91 12:52:47 MST To: icon-group In-Reply-To: Richard L. Goerwitz's message of 1 Nov 91 19:50:34 GMT <1991Nov1.195034.18246@midway.uchicago.edu> Subject: Icon futures ? (coroutines,patterns,etc.) Richard Goerwitz: > Spencer's regexp library uses a (possibly terribly slow) NFA. I'm now re- > writing the FSF's regex functions for use in Icon, and adding an experimen- > tal block type that holds the regexp pattern buffer. Well, I love the FSF, but if you are hacking GNU code, we can never put it in Icon. GNU code is copylefted, Icon is really Public Domain. I guess you can come out with Gnu Icon, but I wish there were a public domain regexp library you could use instead. It sounds like you're really hacking the implementation on this one! I hope you're looking forward to doing your changes for the Icon compiler as well...heh, heh, heh. Wouldn't it be nice if the compiler and the interpreter used the same runtime system? From cjeffery Sat Nov 2 14:46:18 1991 Date: Sat, 2 Nov 91 14:46:18 MST From: "Clinton L. Jeffery" Message-Id: <9111022146.AA11560@cheltenham.cs.arizona.edu> Received: by cheltenham.cs.arizona.edu; Sat, 2 Nov 91 14:46:18 MST To: icon-group In-Reply-To: Richard L. Goerwitz's message of 29 Oct 91 17:55:05 GMT <1991Oct29.175505.16924@midway.uchicago.edu> Subject: Icon futures ? (coroutines,patterns,etc.) From: Richard L. Goerwitz sdm7g@galen.med.Virginia.EDU (Steven D. Majewski) writes: >What would be the ramifications of changing the language to a hierarchy It's funny, but someone I was talking to the other day said *precisely* the same thing. He said if Icon supported such a hierarchy, we could just define a regular expression sub-type, and everyone would think it was a great idea. Do you guys mean a subtype of strings (because a regular expression is a string that denotes something else) or a subtype of sets of strings, because regular expressions are a concise notation for regular languages? Or both? Hey--another application of multiple inheritance... Icon is pre-OO (1976), and I can't imagine anyone at the Icon Project would relish the though of redesigning the entire system. Clinton Jef- fery has designed an Icon OO preprocessor. But I think that, in order to do what you want, you'd need to design another language like Icon that wasn't Icon. (Or is this just semantics?) What you and Steve are describing certainly *isn't* Icon. But this new language you're talking about isn't far off, and redesigning the entire system isn't necessary to achieve it. While I admit that my own object oriented language Idol is far from perfect, it is also far from finished. The fact that it is a preprocessor and not a native translator is an *implementation detail* ; preprocessing does not add significantly to translation time or to execution time. Oh, it adds a field access onto some procedure invocations, but field access is one of Icon's faster operations. But, some people will never believe me, and that's OK! :-) Clint From icon-group-request@arizona.edu Sun Nov 3 03:50:10 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 3 Nov 91 03:50:10 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Hopey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA07401; Sun, 3 Nov 91 03:50:06 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sun, 3 Nov 1991 03:49 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA21711; Sun, 3 Nov 91 02:11:51 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sun, 3 Nov 1991 03:49 MST Date: 31 Oct 91 07:17:21 GMT From: elroy.jpl.nasa.gov!sdd.hp.com!uakari.primate.wisc.edu!caen!uvaarpa!murdoch!galen.med.Virginia.EDU!sdm7g@ames.arc.nasa.gov (Steven D. Majewski) Subject: RE: Icon futures ? (coroutines,patterns,etc.) Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <18D7E0C4788061AC@Arizona.edu> Message-Id: <1991Oct31.071721.25952@murdoch.acc.Virginia.EDU> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Virginia References: <1991Oct29.070610.12954@murdoch.acc.Virginia.EDU>, <1991Oct29.175505.16924@midway.uchicago.edu> In article <1991Oct29.175505.16924@midway.uchicago.edu> goer@ellis.uchicago.edu (Richard L. Goerwitz) writes: >sdm7g@galen.med.Virginia.EDU (Steven D. Majewski) writes: > >>I have been reading Ken Walker's paper in Computer Language (1989) >>First Class Patterns for Icon. > >An interesting article, but probably a bad place to start if you're >just beginning Icon :-). > I like to try to combine theory & practice ( fun & work ) ! :-) > >In a way, it's not fair to compare YACC with Icon. A more accurate compari- >son would be Icon vs. C on the level of string processing facilities. To >compare YACC mixes levels. There *is* no compiler compiler for Icon. Al- >though Icon makes it unnecessary to use YACC-type tools in many instances, >it would still be very, very useful to have a YACC-like tool for it. That's >an opinion I fully agree with. > I'm arguring for the facility, not proposing any particular implementation. The point Ken Walker makes in his paper is that Icon string matching, being proceduarily embedded, are somewhat obscure. It may take time and thought to write an unambiguous YACC grammar, but it is pretty clean looking and not to bad to modify. A YACC in Icon that outputs Icon code would be an acceptable solution. Ken Walker's extension to coexpressions ( patterns as unevaluated expressions ) is another solution that does not seem too far from the Spirit of Icon. I *Would* argue, though, that it is not as important to clone a replica of YACC|regexp. Icon is not C ! Perhaps existing practice can be improved upon. For example: Icon's dynamic typing would easily allow more variability in the sematic value returned - terminals could return a single value while non-terminals yield a parse tree of the expression. ( Yes, you can certainly do it in C with Unions of Structures, but its probably easier in Icon. ) I don't think there is anything magical about the unix stype regular expression notation. If it *WAS* used consistently by all of the tools, one could argue for consistency with mathematical notation (Kleene), but it is NOT used consistently. ( + is Kleene +, but * is wildcard ) Steven Kearns(*) uses (in TLex) a very lisp-ish notation: ( define MP ( or empty ( seq '{' MP '}' ))) ( define alphanumeric ( or ( from "0" to "9" ) ( from A to Z ) ( from a to z ) ) ( define nonAN ( notclass alphanumeric )) I perfer Lisp-like to Grep-like, but is there a particularly Icon-like representation ? ( Other than Ken Walker's ? ) BTW: Kearns paper both describes TLex ( Tyranosaurus Lex ) and compares performance of: TLex, FLex, GAWK, YACC & Icon - and Icon does not do badly, so perhaps the argument of the effeciency of DFA pattern matching is not as strong as I thought. ( But probably not, there were only 2 tests and Icon did best in the simplest one. ) (*) Kearns, Steven M. TLEX. Software-Practice and Experience. Vol. 21(8) PP 805-821 (AUGUST 1991) Kearns, Steve M. Extending Regular Expressions with Context Operators and Parse Extraction. Software-Practice and Experience. Vol 21(8) pp 787-804 (August 1991) Kearns, Steven M. TLEX v.68 User's Manual. Coulmbia University Technical Report. CUCS-037-90 Thanks for your other comments and corrections. > >Icon is pre-OO (1976), and I can't imagine anyone at the Icon Project >would relish the though of redesigning the entire system. Clinton Jef- >fery has designed an Icon OO preprocessor. But I think that, in order >to do what you want, you'd need to design another language like Icon >that wasn't Icon. (Or is this just semantics?) > I started out looking for a comparison of various string and symbol processing oriented tools and languages available. ( The old perl .vs. Icon .vs. ... thread. ) Which is better ? I've decided that they are incomensurable: Icon, Perl & Yacc, for example each have a rather different domain. ( It's not even Apples&Oranges ; it's more like Applesause and OrangeJuice! ) But I'm still trying to sort out the "basic" paradigms of string & string-oriented-symbol processin. ( To back up and digress, there seems to be two constellations: the world of regexps&grammars/AWK/ Perl/Yacc/etc. and ( make that two-and-a-half! ) SNOBOL type pattern matching which has split off into SNOBOL-type functions graafted onto other languages ( PROLOG, SETL, VAX/VMS LIB$C_{SPANC,ANY,etc.}, and Icon. ) If an ideal solution leads to Yet-another-language, I won't complain. But at this point I don't know what an ideal solution would LOOK like! ( Maybe it actually looks a lot like Icon ;-) - Steve Majewski -- ========= "If you've got a hammer, find a nail!" - George Bush ========= Steven D. Majewski University of Virginia Physiology Dept. sdm7g@Virginia.EDU Box 449 Health Sciences Center (804)-982-0831 Charlottesville, VA 22908 From icon-group-request@arizona.edu Sun Nov 3 21:59:13 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 3 Nov 91 21:59:13 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA07586; Sun, 3 Nov 91 21:59:13 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sun, 3 Nov 1991 21:58 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA27919; Sun, 3 Nov 91 20:47:53 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sun, 3 Nov 1991 21:58 MST Date: 1 Nov 91 04:21:31 GMT From: midway!quads!goer@handies.ucar.edu (Richard L. Goerwitz) Subject: RE: Icon futures ? (coroutines,patterns,etc.) Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: Message-Id: <1991Nov1.042131.6075@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago References: <1991Oct29.070610.12954@murdoch.acc.Virginia.EDU> sdm7g@galen.med.Virginia.EDU (Steven D. Majewski) writes: >I *Would* argue, though, that it is not as important to clone a replica >of YACC|regexp. Icon is not C ! Perhaps existing practice can be improved >upon. For example: Icon's dynamic typing would easily allow more >variability in the sematic value returned - terminals could return a single >value while non-terminals yield a parse tree of the expression. ( Yes, you >can certainly do it in C with Unions of Structures, but its probably easier >in Icon. ) Couldn't agree more. Icon code could be arranged as a big case statement and the states could be squirted out by a coexpression, as in: case @get_state of { 1: code for state 1 2: code for state 2 etc. The parse tree could be built ad hoc by user code or by the parser itself. Dunno. Anyway, there are lots of neat things one could do. But certainly you're right that we would not have to use a yacc clone. I hope that one of us actually gets to doing this project. >Steven Kearns(*) uses (in TLex) a very lisp-ish notation: > >( define MP ( or empty ( seq '{' MP '}' ))) >( define alphanumeric ( or ( from "0" to "9" ) > ( from A to Z ) > ( from a to z ) ) >( define nonAN ( notclass alphanumeric )) Since the control structures and contexts are so different, it's hard to compare. In Icon [A-Z] is &ucase; [a-z] is &lcase; [0-9] is &digits. You can construct character sets manually, vowels := 'aeiouAEIOUyY' Note that this is not a string. A string is "aeiouAEIOUyY" (with quota- tion marks). Here's how you create the set of alphanumeric characters that are not vowels: NonVowels := (&letters -- vowels) ++ &digits >I perfer Lisp-like to Grep-like, but is there a particularly Icon-like >representation ? ( Other than Ken Walker's ? ) Depends on what you want, really. >BTW: Kearns paper both describes TLex ( Tyranosaurus Lex ) and compares >performance of: TLex, FLex, GAWK, YACC & Icon - and Icon does not do badly, >so perhaps the argument of the effeciency of DFA pattern matching is >not as strong as I thought. ( But probably not, there were only 2 tests >and Icon did best in the simplest one. ) >...I'm still trying to sort out the "basic" paradigms of string & >string-oriented-symbol processin. ( To back up and digress, there >seem to be two constellations: the world of regexps&grammars/AWK/ >Perl/Yacc/etc. and ( make that two-and-a-half! ) SNOBOL type >pattern matching which has split off into SNOBOL-type functions >graafted onto other languages ( PROLOG, SETL, VAX/VMS LIB$C_{SPANC,ANY,etc.}, >and Icon. ) If an ideal solution leads to Yet-another-language, I won't >complain. But at this point I don't know what an ideal solution would >LOOK like! ( Maybe it actually looks a lot like Icon ;-) Icon's really not a bad solution, I don't think. But we do need some sort of flexible parser generation system for it. Luckily, Icon lets us get away in more cases than other languages without such a system. This is, I would guess, why the language has gotten along so well without one. It's certainly on my long-term list. I still have a lot to learn about parser generation systems. I'm kinda killing time until Rekers' work at Amsterdam is published, since the parallel-parser system he outlines in his forthcoming dissertation is fairly simple, and handles a much wider subset of context free languages than YACC (without too great a sacri- fice of performance [using LR(0) tables to boot!]). I also need to get through the "dragon book," which for a non-CS person is no easy task. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From R.J.Hare@edinburgh.ac.uk Mon Nov 4 15:54:32 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 4 Nov 91 15:54:32 MST Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA17986; Mon, 4 Nov 91 15:54:28 MST Received: from UKACRL.BITNET (MAILER@UKACRL) by Arizona.edu with PMDF#10282; Mon, 4 Nov 1991 15:54 MST Received: from RL.IB by UKACRL.BITNET (Mailer R2.07) with BSMTP id 0845; Mon, 04 Nov 91 09:11:33 GMT Received: from RL.IB by UK.AC.RL.IB (Mailer R2.07) with BSMTP id 3964; Mon, 04 Nov 91 09:11:28 GMT Date: 04 Nov 91 09:10:58 gmt From: R.J.Hare@edinburgh.ac.uk Subject: Ken Walkers paper To: icon-group@cs.arizona.edu Message-Id: <04 Nov 91 09:10:58 gmt 320785@EMAS-A> X-Envelope-To: icon-group@cs.arizona.edu Via: UK.AC.ED.EMAS-A; 4 NOV 91 9:11:26 GMT Could someone please post a full citation for Ken Walkers paper referred to on ths board a few messages back? We don't get this journal and our ILL people require as full a citation as possible before they can institute a search. Thanks. Roger Hare. From kwalker Mon Nov 4 16:08:34 1991 Received: from ocotillo.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 4 Nov 91 16:08:34 MST Date: Mon, 4 Nov 91 16:08:33 MST From: "Kenneth Walker" Message-Id: <9111042308.AA19543@ocotillo.cs.arizona.edu> Received: by ocotillo.cs.arizona.edu; Mon, 4 Nov 91 16:08:33 MST To: icon-group Subject: Re: Ken Walkers paper > Date: 04 Nov 91 09:10:58 gmt > From: R.J.Hare@edinburgh.ac.uk > > Could someone please post a full citation for Ken Walkers paper referred to > on ths board a few messages back? We don't get this journal and our ILL people > require as full a citation as possible before they can institute a search. %A Kenneth W. Walker %T ``First-Class Patterns for Icon'' %J Journal of Computer Languages %V 14 %N 3 %D 1989 %P 153-163 Ken Walker / Computer Science Dept / Univ of Arizona / Tucson, AZ 85721 +1 602 621-4252 kwalker@cs.arizona.edu {uunet|allegra|noao}!arizona!kwalker From isidev!nowlin@uunet.uu.net Tue Nov 5 10:54:37 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Tue, 5 Nov 91 10:54:37 MST Received: from relay2.UU.NET by optima.cs.arizona.edu (4.1/15) id AA28578; Tue, 5 Nov 91 10:54:34 MST Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay2.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA23808; Tue, 5 Nov 91 12:54:35 -0500 Date: Tue, 5 Nov 91 12:54:35 -0500 From: isidev!nowlin@uunet.uu.net Message-Id: <9111051754.AA23808@relay2.UU.NET> Received: from isidev.UUCP by uunet.uu.net with UUCP/RMAIL (queueing-rmail) id 125333.3483; Tue, 5 Nov 1991 12:53:33 EST To: uunet!cs.arizona.edu!icon-group@uunet.uu.net Subject: ISI phone troubles I apologize to those who have tried to call Iconic Software using the phone number included in the latest newsletter. That number is correct but we switched to a new office last week (plenty of time you say!) and the phone lines are still in limbo. They should be straightened out by this afternoon (11/05/91). If you have problems with the voice line please use email for inquiries. The email address is below. Thanks for your patience. --- --- | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!isi --- --- From icon-group-request@arizona.edu Tue Nov 5 17:21:11 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Tue, 5 Nov 91 17:21:11 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Hopey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA00302; Tue, 5 Nov 91 17:21:08 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Tue, 5 Nov 1991 17:20 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA03722; Tue, 5 Nov 91 04:56:38 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Tue, 5 Nov 1991 17:20 MST Date: 3 Nov 91 02:51:12 GMT From: dog.ee.lbl.gov!hellgate.utah.edu!caen!zaphod.mps.ohio-state.edu!uwm.edu!linac!midway!ellis!goer@ucbvax.berkeley.edu (Richard L. Goerwitz) Subject: RE: regular expression in Icon (addendum) Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <1C78F8E36820431D@Arizona.edu> Message-Id: <1991Nov3.025112.5191@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago References: <9111020103.AA17910@laguna.Metaphor.COM> alex@LAGUNA.METAPHOR.COM (Bob Alexander) writes: >Well, since there seems to be so much interest in regular expressions >recently in icon-group, I have an offering that addresses the two >predominant objections I've been hearing: > >1) Richard G's concern about speed -- the routines that I've > attached to this message run at least 10 times faster than > the findre() suite. Smaller too. There is an interesting > technique used to exploit Icon's built-in backtracking so > that it does not have to be done explicitly in the Icon code. One addendum: I claimed your code was faster than mine in my last posting. I tried a few tests, and this seemed to be the case. I just tried them out in some actual working programs, though, and things look different. Here's why. My egrep routine compiles patterns into an automaton, which is then saved and used if the string is encountered again. Under version 8 the automata are not too big, and the savings in execution speed is pretty good. Note that the automata also don't disable Icon's backtracking mechanisms. Several people have mentioned that they do, but this is not correct. The end result is that if you are calling findre() with just a couple of strings over and over (a very typical case), it runs much better than yours. Try a test pro- gram like this: procedure main() lst1 := ["a*b[cdef](gh)+", "abc", "aaaaab(dghgh)?$"] lst2 := ["abc", "abfgh", "aaaaabdghgh"] every 1 to 500 do ReFind(?lst1, ?lst2) # findre(?lst1, ?lst2) end Anyway, I think this is all academic. Compare our regexp routines to UNIX-based ones some time for a big laugh. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request@arizona.edu Thu Nov 7 14:40:34 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Thu, 7 Nov 91 14:40:34 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Osprey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA05783; Thu, 7 Nov 91 14:40:33 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Thu, 7 Nov 1991 14:39 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA22123; Thu, 7 Nov 91 13:22:54 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Thu, 7 Nov 1991 14:40 MST Date: 7 Nov 91 00:51:12 GMT From: fernwood!cronos!laguna!alex@uunet.uu.net (Bob Alexander) Subject: Corrections to regular expression procedures Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <985D4D9D98808D03@Arizona.edu> Message-Id: <1554@cronos.metaphor.com> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: Metaphor Computer Systems, Mountain View, CA I'm a bit embarrassed about the quality of the regular expression routines I posted a few days ago. I'd like to thank Richard G. for spotting and reporting several bugs -- all of the bugs mentioned in his posting were valid. I coded these routines last April, and I think I forgot that I hadn't tested them very well -- some of the bugs were all too obvious. All the reported bugs have been corrected; I hope I haven't introduced any new ones. But it's likely that some bugs remain -- please don't hesitate to send in those cards and letters! The updated regexp.icn source code follows the message text. There is no change to igrep.icn. Here are Richard's points one-by-one: > There's a bug in the code that prevents the "+" operator from working > correctly. For instance, if I compile your igrep program, and type > echo 'hello' | igrep 'el+o' > It doesn't work. The most embarrassing bug was that the precedence of adjacent characters was real high: this example was grouping as (el)+o. Of course, it *should* be e(l)+o. It's fixed now. > Also, a frequent bug with egrep programs is that they can't do this: > echo 'hello' | igrep 'el*llo' This, too, was grouping as (el)*(llo). > Your gets this one right. Nice job! Thanks for the compliment, however undeserved :-). > Unfortunately, it doesn't get this one right: > echo 'hello' | igrep 'hel?o' Grouping problem again -- (hel)?o -- fixed. > One really neat improvement over egrep is that your program handles > [a-z] right, but takes the dash in [a-] as a dash. In egrep, this is a > syntax error. I guess it depends on how compatible you want to be. I > like your way better. If you're into fixing misfeatures, perhaps you > should also fix the problem with "*a" being a syntax error in most > egreps (and yours). The GNU egrep program recognizes this pattern > correctly. I.e. it knows the "*" can't possibly be a metacharacter. > I'd either make [a-] and *a errors or make them both work. Not half > and half. It's now consistent, but on the negative side -- now [a-] is an error, too. I could have gone the other way, but it seemed unimportant and would have cost some extra code. (Actually, although [a-] was accepted, the "a" was being discarded!). > Probably the only maddening feature of your igrep is that it barfs on > back- slashed parentheses, and other escaped metacharacters. Barfing fixed. > It also seems to have a precedence bug somewhere in the > vertical-slash-handling routines (e.g. echo 'hello' | igrep > '^helg|lo$' doesn't accept the string, as it should for all > egrep-compatible regexp routines). I had misunderstood the manual on this one. It was handling the ^ and $ only at the front of a *whole* RE, but not bracketing terms of alternation. It now works for alternation -- hope I got it right this time. > I get an unpleasant abort also when specifying a pattern like > '^hel(m|g)o$'. Fixed. ############################################################################ # # Name: regexp.icn # # Title: UNIX-like Regular Expression Pattern Matching Procedures # # Author: Robert J. Alexander # # Date: November 6, 1991 # ############################################################################ # # This is a kit of procedures to deal with UNIX-like regular expression # patterns. # # These procedures are interesting partly because of the "recursive # suspension" (or "suspensive recursion" :-) technique used to simulate # conjunction of an arbitrary number of computed expressions (see # notes, below). # # # The public procedures are: # # ReMatch(pattern,s,i1,i2) : i3,i4,...,iN # ReFind(pattern,s,i1,i2) : i3,i4,...,iN # RePat(s) : pattern list # # # ReMatch() produces the sequence of positions in "s" past a substring # starting at "i1" that matches "pattern", but fails if there is no # such position. Similar to match(), but is capable of generating # multiple positions. # # ReFind() produces the sequence of positions in "s" where substrings # begin that match "pattern", but fails if there is no such position. # Similar to find(). Each position is produced only once, even if # several possible matches are possible at that position. # # "pattern" can be either a string or a pattern list -- see RePat(), # below. # # Default values of s, i1, and i2 are handled as for Icon's built-in # string scanning procedures such as match(). # # # RePat(s) : L # # Creates a pattern element list from pattern string "s", but fails if # the pattern string is not syntactically correct. ReMatch() and # ReFind() will automatically convert a pattern string to a pattern # list, but it is faster to do the conversion explicitly if multiple # operations are done using the same pattern. An additional advantage # to compiling the pattern separately is avoiding ambiguity of failure # caused by an incorrect pattern and failure to match a correct pattern. # # # Accessible Global Variables # # After a match, the strings matched by parenthesized regular # expressions are left in list "Re_ParenGroups", and can be accessed by # subscripting in using the same number as the \N construct. # # If it is desired that regular expression format be similar to UNIX # filename generation patterns but still retain the power of full # regular expressions, make the following assignments prior to # compiling the pattern string: # # Re_ArbString := "*" # Defaults to ".*" # Re_AnyString := "?" # Defaults to "." # # The sets of characters (csets) that define a word, digits, and white # space can be modified. The following assignments can be made before # compiling the pattern string. The character sets are captured when # the pattern is compiled, so changing them after pattern compilation # will not alter the behavior of matches unless the pattern string is # recompiled. # # Re_WordChars := 'whatever you like' # # Defaults to &letters ++ &digits ++ "_" # Re_Digits := &digits ++ 'ABCDEFabcdef' # # Defaults to &digits # Re_Space := 'whatever you like' # # Defaults to ' \t\v\n\r\f' # # These globals are normally not initialized until the first call to # RePat(), and then only if they are null. They can be explicitly # initialized to their defaults (if they are null) by calling # Re_Default(). # # Characters compiled into patterns can be passed through a # user-supplied filter procedure, provided in global variable # Re_Filter. The filtering is done before the characters are bound # into the pattern. The filter proc is passed one argument, the string # to filter, and it must return the filtered string as its result. If # the filter proc fails, the string will be used unfiltered. The # filter proc is called with an argument of either type string (for # characters in the pattern) or cset (for character classes [...]). A # typical use for this facility is to implement case-independent # matching. All pattern characters can downshifted by assigning # # Re_Filter := map # # Filtering is done only as the pattern is compiled. Filtering of # strings to be matched must be explicitly done. Therefore, # case-independent matching will occur only if map() is applied to all # strings to be matched. # # In the case of patterns containing alternation, ReFind() will # generally not produce positions in increasing order, but will produce # all positions from the first term of the alternation (in increasing # order) followed by all positions from the second (in increasing # order). If it is necessary that the positions be generated in # strictly increasing order, with no duplicates, assign any non-null # value to Re_Ordered: # # Re_Ordered := 1 # # If the Re_Ordered options is chosen, there is a *small* penalty in # efficiency in some cases, and the co-expression facility is required # in your Icon implementation. Example: # # # Regular Expression Characters and Features Supported # # The regular expression format supported by procedures in this file # model very closely those supported by the UNIX "egrep" program, with # modifications as in the Perl programming language definition. # Following is a brief description of the special characters used in # regular expressions. In the description, the abbreviation RE means # regular expression. # # c An ordinary character (not one of the special characters # discussed below) is a one-character RE that matches that # character. # # \c A backslash followed by any special character is a one- # character RE that matches the special character itself. # # Note that backslash escape sequences representing # non-graphic characters are not supported directly # by these procedures. Of course, strings coded in an # Icon program will have such escapes handled by the # Icon translator. If such escapes must be supported # in strings read from the run-time environment (e.g. # files), they will have to be converted by other means, # such as the Icon Program Library procedure "escape()". # # . A period is a one-character RE that matches any # character. # # [string] A non-empty string enclosed in square brackets is a one- # character RE that matches any *one* character of that # string. If, the first character is "^" (circumflex), # the RE matches any character not in the remaining # characters of the string. The "-" (minus), when between # two other characters, may be used to indicate a range of # consecutive ASCII characters (e.g. [0-9] is equivalent to # [0123456789]). Other special characters stand for # themselves in a bracketed string. # # * Matches zero or more occurrences of the RE to its left. # # + Matches one or more occurrences of the RE to its left. # # ? Matches zero or one occurrences of the RE to its left. # # {N} Matches exactly N occurrences of the RE to its left. # # {N,} Matches at least N occurrences of the RE to its left. # # {N,M} Matches at least N occurrences but at most M occurrences # of the RE to its left. # # ^ A caret at the beginning of an entire RE constrains # that RE to match an initial substring of the subject # string. # # $ A currency symbol at the end of an entire RE constrains # that RE to match a final substring of the subject string. # # | Alternation: two REs separated by "|" match either a # match for the first or a match for the second. # # () A RE enclosed in parentheses matches a match for the # regular expression (parenthesized groups are used # for grouping, and for accessing the matched string # subsequently in the match using the \N expression). # # \N Where N is a digit in the range 1-9, matches the same # string of characters as was matched by a parenthesized # RE to the left in the same RE. The sub-expression # specified is that beginning with the Nth occurrence # of "(" counting from the left. E.g., ^(.*)\1$ matches # a string consisting of two consecutive occurrences of # the same string. # # Perl Extensions # # The following extensions to UNIX REs, as specified in the Perl # programming language, are supported. # # \w Matches any alphanumeric (including "_"). # \W Matches any non-alphanumeric. # # \b Matches only at a word-boundary (word defined as a string # of alphanumerics as in \w). # \B Matches only non-word-boundaries. # # \s Matches any white-space character. # \S Matches any non-white-space character. # # \d Matches any digit [0-9]. # \D Matches any non-digit. # # \w, \W, \s, \S, \d, \D can be used within [string] REs. # # # Note on Details of Matching # # The method of matching differs a bit from UNIX-style regular # expressions -- particularly where closures are concerned ("*", "+", # "{}", "?"). UNIX regular expressions are documented to match the # "longest, leftmost" strings in cases where a choice is needed. The # procedures in this file are capable of generating all possible # matches of the pattern, and generate the possibilities by matching # the fewest first ("shortest, leftmost"). Matching of the various # pattern elements is performed exactly as though it were an Icon # conjunction of the elements. # # # Notes on computed conjunction expressions by "suspensive recursion" # # A conjunction expression of an arbitrary number of terms can be # computed in a looping fashion by the following recursive technique: # # procedure Conjunct(v) # if then # suspend Conjunct() # else # suspend v # end # # The argument "v" is needed for producing the value of the last term # as the value of the conjunction expression, accurately modeling Icon # conjunction. If the value of the conjunction is not needed, the # technique can be slightly simplified by eliminating "v": # # procedure ConjunctAndProduceNull() # if then # suspend ConjunctAndProduceNull() # else # suspend # end # # Note that must still remain in the suspend # expression to test for failure of the term, although its value is not # passed to the recursive invocation, This could have been coded as # # suspend & ConjunctAndProduceNull() # # but wouldn't have been as provocative. # # Since the computed conjunctions in this program are evaluated only for # their side effects, the second technique is used in two situations: # # (1) To compute the conjunction of all of the elements in the # regular expression pattern list (Re_match1()). # # (2) To evaluate the "exactly N times" and "N to M times" # control operations (Re_NTimes()). # record Re_Tok(proc,args) global Re_ParenGroups,Re_Filter,Re_Ordered global Re_WordChars,Re_NonWordChars global Re_Space,Re_NonSpace global Re_Digits,Re_NonDigits global Re_ArbString,Re_AnyString global Re_TabMatch ################### Pattern Translation Procedures ################### procedure RePat(s) # L # # Produce pattern list representing pattern string s. # # # Create a list of pattern elements. Pattern strings are parsed # and converted into list elements as shown in the following table. # Since some list elements reference other pattern lists, the # structure is really a tree. # # Token Generates Matches... # ----- --------- ---------- # ^ Re_Tok(pos,[1]) Start of string or line # $ Re_Tok(pos,[0]) End of string or line # . Re_Tok(move,[1]) Any single character # + Re_Tok(Re_OneOrMore,[tok]) At least one occurrence of # previous token # * Re_Tok(Re_ArbNo,[tok]) Zero or more occurrences of # previous token # | Re_Tok(Re_Alt,[pattern,pattern]) Either of prior expression # or next expression # [...] Re_Tok(Re_TabAny,[cset]) Any single character in # specified set (see below) # (...) Re_Tok(Re_MatchReg,[pattern]) Parenthesized pattern as # single token # The string of no-special # Re_Tok(Re+TabMatch,string) characters # \b Re_Tok(Re_WordBoundary,[Re_WordChars,Re_NonWordChars]) # A word-boundary # (word default: [A-Za-z0-9_]+) # \B Re_Tok(Re_NonWordBoundary,[Re_WordChars,Re_NonWordChars]) # A non-word-boundary # \w Re_Tok(Re_TabAny,[Re_WordChars])A word-character # \W Re_Tok(Re_TabAny,[Re_NonWordChars]) A non-word-character # \s Re_Tok(Re_TabAny,[Re_Space]) A space-character # \S Re_Tok(Re_TabAny,[Re_NonSpace]) A non-space-character # \d Re_Tok(Re_TabAny,[Re_Digits]) A digit # \D Re_Tok(Re_TabAny,[Re_NonDigits]) A non-digit # {n,m} Re_Tok(Re_NToMTimes,[tok,n,m]) n to m occurrences of # previous token # {n,} Re_Tok(Re_NOrMoreTimes,[tok,n]) n or more occurrences of # previous token # {n} Re_Tok(Re_NTimes,[tok,n]) exactly n occurrences of # previous token # ? Re_Tok(Re_ZeroOrOneTimes,[tok]) one or zero occurrences of # previous token # \ Re_Tok(Re_MatchParenGroup,[n]) The string matched by # parenthesis group # local plist # # Initialize. # initial Re_Default() Re_WordChars := cset(Re_WordChars) Re_NonWordChars := ~Re_WordChars Re_Space := cset(Re_Space) Re_NonSpace := ~Re_Space Re_Digits := cset(Re_Digits) Re_NonDigits := ~Re_Digits s ? (plist := Re_pat1(0)) | fail return plist end procedure Re_pat1(level) # L # # Recursive portion of RePat() # local plist,n,m,c,comma static none,parenNbr initial { Re_TabMatch := proc("=",1) none := [] } if level = 0 then parenNbr := 0 plist := [] # # Loop to put pattern elements on list. # until pos(0) do { (="|",plist := [Re_Tok(Re_Alt,[plist,Re_pat1(level + 1) | fail])]) | put(plist, ## (="^",*plist = 0 | plist[-1].proc === Re_Alt,Re_Tok(pos,[1])) | (="^",pos(2) | &subject[-2] == "|",Re_Tok(pos,[1])) | (="$",pos(0) | match("|"),Re_Tok(pos,[0])) | (match(")"),level > 0,break) | (=Re_ArbString,Re_Tok(Re_Arb,none)) | (=Re_AnyString,Re_Tok(move,[1])) | (="+",Re_Tok(Re_OneOrMore,[Re_prevTok(plist) | fail])) | (="*",Re_Tok(Re_ArbNo,[Re_prevTok(plist) | fail])) | 1(Re_Tok(Re_TabAny,[c := Re_cset()]),\c | fail) | 3(="(",n := parenNbr +:= 1, Re_Tok(Re_MatchReg,[Re_pat1(level + 1) | fail,n]), move(1) | fail) | (="\\b",Re_Tok(Re_WordBoundary,[Re_WordChars,Re_NonWordChars])) | (="\\B",Re_Tok(Re_NonWordBoundary,[Re_WordChars,Re_NonWordChars])) | (="\\w",Re_Tok(Re_TabAny,[Re_WordChars])) | (="\\W",Re_Tok(Re_TabAny,[Re_NonWordChars])) | (="\\s",Re_Tok(Re_TabAny,[Re_Space])) | (="\\S",Re_Tok(Re_TabAny,[Re_NonSpace])) | (="\\d",Re_Tok(Re_TabAny,[Re_Digits])) | (="\\D",Re_Tok(Re_TabAny,[Re_NonDigits])) | (="{",(n := tab(many(&digits)),comma := =(",") | &null, m := tab(many(&digits)) | &null,="}") | fail, if \m then Re_Tok(Re_NToMTimes, [Re_prevTok(plist),integer(n),integer(m)]) else if \comma then Re_Tok(Re_NOrMoreTimes, [Re_prevTok(plist),integer(n)]) else Re_Tok(Re_NTimes,[Re_prevTok(plist),integer(n)])) | (="?",Re_Tok(Re_ZeroOrOneTimes,[Re_prevTok(plist) | fail])) | Re_Tok(Re_TabMatch,[Re_string(level)]) | (="\\",n := tab(any(&digits)),Re_Tok(Re_MatchParenGroup,[integer(n)])) ) | fail } return plist end procedure Re_prevTok(plist) # # Pull previous token from the pattern list. This procedure must take # into account the fact that successive character tokens have been # optimized into a single string token. # local lastTok,s,r lastTok := pull(plist) | fail if lastTok.proc === Re_TabMatch then { s := lastTok.args[1] r := Re_Tok(Re_TabMatch,[s[-1]]) s[-1] := "" if *s > 0 then { put(plist,lastTok) lastTok.args[1] := s } return r } return lastTok end procedure Re_Default() # # Assign default values to regular expression translation globals, but # only to variables whose values are null. # /Re_WordChars := &letters ++ &digits ++ "_" /Re_Space := ' \t\v\n\r\f' /Re_Digits := &digits /Re_ArbString := ".*" /Re_AnyString := "." return end procedure Re_cset() # # Matches a [...] construct and returns a cset. # local complement,c,e,ch,chars (="[",complement := ="^" | &null, c := (ch := (="-" | "")) || move(1) || tab(find("]")),move(1)) | fail c ? { e := ch while chars := tab(upto('-\\')) do { e ++:= case move(1) of { "-": chars[1:-1] ++ &cset[ord(chars[-1]) + 1:ord(move(1)) + 2] | return &null "\\": case ch := move(1) of { "w": Re_WordChars "W": Re_NonWordChars "s": Re_Space "S": Re_NonSpace "d": Re_Digits "D": Re_NonDigits default: ch } } } e ++:= tab(0) if \complement then e := ~e } e := (\Re_Filter)(e) return cset(e) end procedure Re_string(level) # # Matches a string of non-special characters, returning a string. # local special,s,p static nondigits initial nondigits := ~&digits special := if level = 0 then '\\.+*|[({?' else '\\.+*|[({?)' s := tab(upto(special) | 0) while ="\\" do { p := &pos if tab(any('wWbBsSdD')) | (tab(any('123456789')) & (pos(0) | any(nondigits))) then { tab(p - 1) break } s ||:= move(1) || tab(upto(special) | 0) } if pos(0) & s[-1] == "$" then { move(-1) s[-1] := "" } s := string((\Re_Filter)(s)) return "" ~== s end ##################### Matching Engine Procedures ######################## procedure ReMatch(plist,s,i1,i2) # i3,i4,...,iN # # Produce the sequence of positions in s past a string starting at i1 # that matches the pattern plist, but fails if there is no such # position. Similar to match(), but is capable of generating multiple # positions. # if type(plist) ~== "list" then plist := RePat(plist) | fail /i1:= if /s := &subject then &pos else 1 ; /i2 := 0 Re_ParenGroups := [] suspend s[i1:i2] ? (Re_match1(plist,1),i1 + &pos - 1) end procedure Re_match1(plist,i) # s1,s2,...,sN # # Used privately by ReMatch() to simulate a computed conjunction # expression via recursive generation. # local tok suspend if tok := plist[i] then Re_tok_match(tok,plist,i) & Re_match1(plist,i + 1) else &null end procedure ReFind(plist,s,i1,i2) # i3,i4,...,iN # # Produce the sequence of positions in s where strings begin that match # the pattern plist, but fails if there is no such position. Similar # to find(). # local p if type(plist) ~== "list" then plist := RePat(plist) | fail /i1 := if /s := &subject then &pos else 1 ; /i2 := 0 s[i1:i2] ? suspend ( tab(Re_skip(plist,1)) & p := &pos & Re_match1(plist,1)\1 & i1 + p - 1) end procedure Re_tok_match(tok,plist,i) # # Match a single token. Can be recursively called by the token # procedure. # local prc prc := tok.proc suspend ( if prc === Re_Arb then Re_Arb(plist,i) else suspend prc!tok.args ) end ########## Heuristic Code for Matching Arbitrary Characters ########## procedure Re_skip(plist,i) # s1,s2,...,sN # # Used privately -- match a sequence of strings in s past which a match # of the first pattern element in plist is likely to succeed. This # procedure is used for heuristic performance improvement by ReMatch() # for the ".*" pattern element, and by ReFind(). # local x,s,p,prc x := plist[i] suspend case prc := (\x).proc | &null of { Re_TabMatch: find!x.args Re_TabAny: upto!x.args pos: x.args[1] ## Re_WordBoundary: Re_WordBoundaries!x.args Re_WordBoundary | Re_NonWordBoundary: p := &pos & tab(Re_skip(plist,i + 1)) & prc!x.args & untab(p) Re_OneOrMore | Re_MatchParenGroup: if s := (\Re_ParenGroups)[x.args[1]] then find(s) else &pos to *&subject + 1 Re_NToMTimes | Re_NOrMoreTimes | Re_NTimes: if x.args[2] > 0 then Re_skip(x.args[1],1) else &pos to &subject + 1 Re_MatchReg: Re_skip(x.args[1],1) Re_Alt: if \Re_Ordered then Re_result_merge{Re_skip(x.args[1],1),Re_skip(x.args[2],1)} else Re_skip(x.args[1 | 2],1) default: &pos to *&subject + 1 } end procedure Re_result_merge(L) # # Programmer-defined control operation to merge the result sequences of # two integer-producing generators. Both generators must produce their # result sequences in numerically increasing order with no duplicates, # and the output sequence will be in increasing order with no # duplicates. # local e1,e2,r1,r2 e1 := L[1] ; e2 := L[2] r1 := @e1 ; r2 := @e2 while \(r1 | r2) do if /r2 | \r1 < r2 then suspend r1 do r1 := @e1 | &null else if /r1 | r1 > r2 then suspend r2 do r2 := @e2 | &null else r2 := @e2 | &null end procedure untab(origPos) # # Converts a string scanning expression that moves the cursor to one # that produces a cursor position and doesn't move the cursor (converts # something like tab(find(x)) to find(x). The template for using this # procedure is # # origPos := &pos ; tab(x) & ... & untab(origPos) # local newPos newPos := &pos tab(origPos) suspend newPos tab(newPos) end ## procedure Re_WordBoundaries(wd) ## # ## # Produce positions that are word boundaries. ## # ## local p,q ## p1 := p := &pos ## while q := upto(wd,,p) | break do { ## suspend q ## p := many(wd,,q) ## suspend p ## } ## tab(p1) ## end ####################### Matching Procedures ####################### procedure Re_Arb(plist,i) # # Match arbitrary characters (.*) # suspend tab(if \plist then Re_skip(plist,i + 1) else 1 to *&subject + 1) end procedure Re_TabAny(C) # # Match a character of a character set ([...],\w,\W,\s,\S,\d,\D #) suspend tab(any(C)) end procedure Re_MatchReg(tokList,groupNbr) # # Match parenthesized group and assign matched string to list Re_ParenGroup # local p,s p := &pos /Re_ParenGroups := [] every Re_match1(tokList,1) do { while *Re_ParenGroups < groupNbr do put(Re_ParenGroups) s := &subject[p:&pos] Re_ParenGroups[groupNbr] := s suspend s } Re_ParenGroups[groupNbr] := &null end procedure Re_WordBoundary(wd,nonwd) # # Match word-boundary (\b) # suspend ((pos(1),any(wd)) | (pos(0),move(-1),tab(any(wd))) | (move(-1), (tab(any(wd)),any(nonwd)) | (tab(any(nonwd)),any(wd))),"") end procedure Re_NonWordBoundary(wd,nonwd) # # Match non-word-boundary (\B) # suspend ((pos(1),any(nonwd)) | (pos(0),move(-1),tab(any(nonwd))) | (move(-1), (tab(any(wd)),any(wd)) | (tab(any(nonwd)),any(nonwd)),"")) end procedure Re_MatchParenGroup(n) # # Match same string matched by previous parenthesized group (\N) # suspend if s := \Re_ParenGroups[n] then =s else "" end ################### Control Operation Procedures ################### procedure Re_ArbNo(tok) # # Match any number of times (*) # suspend "" | (Re_tok_match(tok) & Re_ArbNo(tok)) end procedure Re_OneOrMore(tok) # # Match one or more times (+) # suspend Re_tok_match(tok) & Re_ArbNo(tok) end procedure Re_NToMTimes(tok,n,m) # # Match n to m times ({n,m} # suspend Re_NTimes(tok,n) & Re_ArbNo(tok)\(m - n + 1) end procedure Re_NOrMoreTimes(tok,n) # # Match n or more times ({n,}) # suspend Re_NTimes(tok,n) & Re_ArbNo(tok) end procedure Re_NTimes(tok,n) # # Match exactly n times ({n}) # if n > 0 then suspend Re_tok_match(tok) & Re_NTimes(tok,n - 1) else suspend end procedure Re_ZeroOrOneTimes(tok) # # Match zero or one times (?) # suspend "" | Re_tok_match(tok) end procedure Re_Alt(tokList1,tokList2) # # Alternation (|) # suspend Re_match1(tokList1 | tokList2,1) end From icon-group-request@arizona.edu Thu Nov 7 23:04:51 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Thu, 7 Nov 91 23:04:51 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA25028; Thu, 7 Nov 91 23:04:58 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Thu, 7 Nov 1991 23:04 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA12836; Thu, 7 Nov 91 21:55:40 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Thu, 7 Nov 1991 23:04 MST Date: 7 Nov 91 17:44:39 GMT From: milton!nntp.uoregon.edu!euclid!haertel@beaver.cs.washington.edu Subject: RE: Corrections to regular expression procedures Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: Message-Id: <1991Nov7.174439.5613@nntp.uoregon.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: Department of Mathematics, University of Oregon References: <1554@cronos.metaphor.com>, <1991Nov7.161613.2974@midway.uchicago.edu> In article <1991Nov7.161613.2974@midway.uchicago.edu> goer@ellis.uchicago.edu (Richard L. Goerwitz) writes: >The final big syntax difference (and this one's bitten me time after time) Oops, it looks like it bit you again! Ouch! :-) >is the question of when ^, *, and $ are metas. In awk and egrep, the ^ only >has special meaning at SOL after | and (, and also after [ (but note [^-z]). >Conversely, $ only indicates EOL after ) and |. *, +, and ? are only spe- >cial in the opposite environments (i.e. NOT after |, ( and ), and also not >at SOL, or inside brackets, [*$?], or ^/$ at the end/beginning of a line). >In grep, the $ and ^ always have their special meaning. Note that your >egrep command flags things like '(*a)', which I think is good, but isn't the >normal egrep way of handling things. A good description of how ^, $, and * can lose their special meaning depending on surrounding context, but... it is precisely backwards: In awk and egrep, the ^, $, and * symbols have their special meaning all the time. In grep, the ^, $ and * symbols may have their special meaning turned off, depending where they appear in the regexp, as described above. The bad news is that the context-dependent meanings of $, ^, and * made it into the Posix standard for regular expressions, so there is no relief for future grep implementors. From @um.cc.umich.edu:Paul_Abrahams@MTS.cc.Wayne.edu Fri Nov 8 06:51:48 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 8 Nov 91 06:51:48 MST Received: from mailrus.cc.umich.edu by optima.cs.arizona.edu (4.1/15) id AA12845; Fri, 8 Nov 91 06:51:45 MST Received: from um.cc.umich.edu by mailrus.cc.umich.edu (5.61/1123-1.0) id AA26171; Fri, 8 Nov 91 08:47:48 -0500 Received: from MTS.cc.Wayne.edu by um.cc.umich.edu via MTS-Net; Fri, 8 Nov 91 08:46:48 EST Date: Fri, 8 Nov 91 08:46:32 EST From: Paul_Abrahams@mts.cc.wayne.edu To: icon-group@cs.arizona.edu Message-Id: <384006@MTS.cc.Wayne.edu> Subject: Obtaining Icon code Lots of code has been posted to this forum. Is there any way to obtain it---particularly for people like me who don't have ftp access? Paul Abrahams From icon-group-request@arizona.edu Sat Nov 9 12:32:04 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 9 Nov 91 12:32:04 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA04010; Sat, 9 Nov 91 12:32:02 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sat, 9 Nov 1991 12:31 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA29311; Sat, 9 Nov 91 11:23:51 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sat, 9 Nov 1991 12:31 MST Date: 9 Nov 91 06:47:40 GMT From: fernwood!cronos!laguna!alex@uunet.uu.net (Bob Alexander) Subject: Regular expression timings Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <18C15EA7A6600343@Arizona.edu> Message-Id: <1580@cronos.metaphor.com> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: Metaphor Computer Systems, Mountain View, CA From: alex@laguna.metaphor.com (Bob Alexander) Newsgroups: comp.lang.icon Subject: Regular expression timings References: Sender: Followup-To: Distribution: world Organization: Metaphor Computer Systems, Mountain View, CA Keywords: Here are some timings done comparing regular expression routines in Icon and otherwise. The timings are the result of the C shell "time" command applied to the GNU regression test set -- two trials each. What surprises me most about these results is that the Icon versions are not really all that much slower than standard UNIX egrep (I never built GNU egrep, but it's probably faster). I had expected the Icon versions to be several times slower than egrep, but they're not. Using the Icon *compiler*, the times might be downright competitive! 2.9u 7.2s 0:13 76% 0+108k 0+10io 0pf+0w standard UNIX egrep (sun4) 2.8u 7.2s 0:13 76% 0+112k 0+10io 0pf+0w 3.4u 10.5s 0:18 75% 0+176k 0+11io 0pf+0w igrep 3.4u 10.5s 0:18 74% 0+172k 0+ 9io 0pf+0w 4.5u 10.7s 0:20 76% 0+188k 0+ 9io 0pf+0w igrep using findre() instead of 4.7u 10.4s 0:20 74% 0+184k 0+10io 0pf+0w ReFind() From icon-group-request@arizona.edu Sat Nov 9 12:32:58 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 9 Nov 91 12:32:58 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA04029; Sat, 9 Nov 91 12:32:56 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sat, 9 Nov 1991 12:32 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA29032; Sat, 9 Nov 91 11:16:09 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sat, 9 Nov 1991 12:32 MST Date: 9 Nov 91 06:16:04 GMT From: midway!ellis!goer@uunet.uu.net (Richard L. Goerwitz) Subject: RE: Latest regexp.icn Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <18E26E8D76600344@Arizona.edu> Message-Id: <1991Nov9.061604.3716@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago References: <1576@cronos.metaphor.com> In <1576@cronos.metaphor.com> alex@laguna.metaphor.com (Bob Alexander) writes: >I'm not sure if anyone really cares about such minor differences >concerning only incorrect regular expressions. My inclination is to >leave the code as is. It could be changed to pass the GNU tests, but >that would result in (probably) more code. My objective was to create >useful, correct RE procedures, not really to create an *exact* >duplicate of any existing product. I can't speak for anyone but myself, but after long and intense use of regular expressions for natural language processing tasks of various I've found that some of the small differences we have been discussing are actually very important to experienced hands. If you want me to prioritize, then I'd say that things like "\" and "[^]" are not really all that vital. But the question of where metacharacters have their special meaning, where they are literals, and where they are errors, is really very important. This is just my personal opinion. It's been formed over several years of almost daily work in natural language parsing and processing. It's not so much a religious thing as a practical one. Let me give you a typical example that every sysadmin can relate to. Very often, in con- junction with the find command, I find myself checking quickly through lots of compressed files, trying to find a given pattern (and the name of the file it occurs in): find . -name '*.Z' -exec zcat {} \; -print | egrep '(^\./|pattern)' There are other examples I could list, but this one will suffice to illus- trate the point. Your earlier incarnation of igrep handles this expression like UNIX grep, not egrep. It therefore fails in precisely those situations where I expect it to accept a line. Since your procedures are geared mainly for egrep-style patterns, this inconsistency could be maddening unless well documented. And if it came to this, I'd wonder whether it might not be bet- ter simply to re-code in accordance with the standard. So in general, I'd reiterate that some of those little things count for a lot. I can't speak for the PERL extensions, incidentally, since I'm only a casual PERL user. I've never really gotten out of the habit of just using sed, awk, and what not. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request@arizona.edu Sat Nov 9 13:02:44 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 9 Nov 91 13:02:44 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA04863; Sat, 9 Nov 91 13:02:42 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sat, 9 Nov 1991 13:02 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA00687; Sat, 9 Nov 91 11:57:56 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sat, 9 Nov 1991 13:02 MST Date: 9 Nov 91 07:29:47 GMT From: uakari.primate.wisc.edu!sdd.hp.com!cs.utexas.edu!uwm.edu!linac!midway!ellis!goer@ames.arc.nasa.gov (Richard L. Goerwitz) Subject: RE: Latest regexp.icn (mods to the preceding) Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <1D06576006600395@Arizona.edu> Message-Id: <1991Nov9.072947.6283@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago References: <1576@cronos.metaphor.com>, <1991Nov9.061604.3716@midway.uchicago.edu> > find . -name '*.Z' -exec zcat {} \; -print | egrep '(^\./|pattern)' > >Your earlier incarnation of igrep handles this expression like UNIX grep, >not egrep. Hmmm. Just checked your latest version, and presto, it works like egrep, so that's one thing I'd call a possibly important incompatibility solved. But while checking I noticed something else. Try this: procedure main() if ReFind("lo", "hello", -3, 0) > 3 # if find("lo", "hello", -3, 0) > 3 then write("okay") else write("nogo") end Getting the defaults right for &subject and &pos, accounting for negative values, and checking for out-of-bounds values is a bit of a headache. It's also technically legal to say this: "hello" ? find("ll", &null, 3, 5) So I'd expect this to be okay, too: "hello" ? ReFind("ll", &null, 3, 5) This is kind of hard (*I* think) to get right. Maybe others haven't had the same problems I've had, and maybe in your case, Bob, you didn't want to bo- ther with full emulation of the builtins' behavior. When I'm trying to go the whole nine yards, I usually do something like this: if /s := &subject then default_val := &pos else default_val := 1 if \i then { if i < 1 then i := *s + (i+1) } else i := default_val if \j then { if j < 1 then j := *s + (j+1) } else j := *s+1 Oh, and one thing I often forget is to switch i and j around if j is bigger than i. You need to do that after setting all the defaults. That's how the builtins do it. Right now, Re{match,find} don't do this, and it yields some backwards results. I see you don't need to check if i and j are in range the way your doing things. The subscript expression will just fail, and you'll achieve the desired result automatically if i and/or j is out of range. This is very subtle stuff, in my opinion, but over time I've killed myself enough times straying from standard i,j builtin behavior that I've just thrown in the towel, and followed the patterns I see in the source code for the builtins. Maybe this is nit-picking. Thought it might be interesting to post, though, and see if anyone else cares about such things. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request@arizona.edu Sat Nov 9 14:22:03 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 9 Nov 91 14:22:03 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Maggie.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA06672; Sat, 9 Nov 91 14:21:59 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sat, 9 Nov 1991 14:20 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA02988; Sat, 9 Nov 91 13:04:21 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sat, 9 Nov 1991 14:21 MST Date: 9 Nov 91 08:49:12 GMT From: cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!midway!ellis!goer@ucbvax.berkeley.edu (Richard L. Goerwitz) Subject: RE: Regular expression timings Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <280C1A4926600442@Arizona.edu> Message-Id: <1991Nov9.084912.9703@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago References: <1580@cronos.metaphor.com> In <1580@cronos.metaphor.com> alex@laguna.metaphor.com (Bob Alexander) writes: > >Here are some timings done comparing regular expression routines in >Icon and otherwise... > >What surprises me most about these results is that the Icon versions >are not really all that much slower than standard UNIX egrep (I never >built GNU egrep, but it's probably faster). I had expected the Icon >versions to be several times slower than egrep, but they're not. I'd guess that what you're actually testing in the cases you cited is the speed of compilation. Load and startup time is absorbed largely by the sys time in the "time" command's output, but it doesn't take into account how long the executable takes to initialize and compile the regular expression. I'll wager that if you take a standard-size file, say 50k or so, and run each of the greps on it, you'll find that the Icon-based ones are significantly slower. I just tried doing an "egrep find regexp.icn" using your igrep, my system egrep command, and GNU egrep. Igrep was 1.3u, while both GNU and my sys- tem egrep took up so little time that the system registered it as 0.0u. I've run a lot of indexing and text retrieval programs, and I've always found that it was well worth going through all the trouble of writing data to a temp file, then turning the system egrep command loose on it, rather than to try to use my findre. I'd like to see whether your program will run fast enough to make the difference less pronounced, or at least to give me a secondary option on systems that don't have pipes, and don't have a regexp-based pattern finder in the system inventory. Sounds like it might (which would make my day!). -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request@arizona.edu Sat Nov 9 19:02:44 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 9 Nov 91 19:02:44 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Maggie.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA13244; Sat, 9 Nov 91 19:02:42 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sat, 9 Nov 1991 19:02 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA11401; Sat, 9 Nov 91 17:45:49 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sat, 9 Nov 1991 19:02 MST Date: 9 Nov 91 23:31:47 GMT From: fernwood!cronos!laguna!alex@uunet.uu.net (Bob Alexander) Subject: More fixes to regexp.icn Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <4F519F2706600643@Arizona.edu> Message-Id: <1581@cronos.metaphor.com> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: Metaphor Computer Systems, Mountain View, CA Next round of bug fixes -- minor but nonetheless bugs. From Richard G.: > Getting the defaults right for &subject and &pos, accounting for negative > values, and checking for out-of-bounds values is a bit of a headache. > ... > This is kind of hard (*I* think) to get right. Maybe others haven't had the > same problems I've had ... Well, obviously *this* programmer had some difficulty, too, since there were errors. One can consider oneself as a slick coder until someone *actually uses* the code :-). But seriously, I really appreciate the enthusiastic help and am having a lot of fun with this. (I think I'm going to have to revisit some other procedures I've written that have the same defaulting bugs). These bugs are minor enough that I won't repost the whole huge thing again, but replacement of the body of ReMatch with the following appears to handle the defaults properly. The same sort of changes should also be applied to ReFind. local i <-- new if type(plist) ~== "list" then plist := RePat(plist) | fail if /s := &subject then /i1 := &pos else /i1 := 1 ; /i2 := 0 <-- changed s ? {(tab(i1),i := &pos,s := tab(i2)) | fail ; i >:= &pos} <-- new Re_ParenGroups := [] suspend s ? (Re_match1(plist,1),i + &pos - 1) <-- changed Since providing the defaults for builtin-like user-defined scanning procedures is tricky, and it is a recurring problem, maybe it would be useful to have a handy procedure to do the job unerringly every time, e.g.: # # returns a list of # [,] # procedure defaults(s,i1,i2) if /s := &subject then /i1 := &pos else /i1 := 1 /i2 := 0 s ? { (tab(i1) & i1 := &pos & s := tab(i2)) | fail i1 >:= &pos } return [s,i1] end The ReMatch procedure body could then be written as: local i if type(plist) ~== "list" then plist := RePat(plist) | fail d := defaults(s,i1,i2) | fail Re_ParenGroups := [] suspend d[1] ? (Re_match1(plist,1),d[2] + &pos - 1) Comments? Improvements? Would this be a useful IPL procedure? -- Bob From icon-group-request@arizona.edu Sat Nov 9 19:35:08 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 9 Nov 91 19:35:08 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA14177; Sat, 9 Nov 91 19:35:04 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sat, 9 Nov 1991 19:34 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA12384; Sat, 9 Nov 91 18:19:16 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sat, 9 Nov 1991 19:34 MST Date: 9 Nov 91 01:05:38 GMT From: fernwood!cronos!laguna!alex@uunet.uu.net (Bob Alexander) Subject: Latest regexp.icn Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <53D66BE646600690@Arizona.edu> Message-Id: <1576@cronos.metaphor.com> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: Metaphor Computer Systems, Mountain View, CA Hopefully we'll soon get past the point where a new version of these regular expression routines gets posted daily. I've finally had a chance to look over the last round of bug reports on the regular expression code. I found (1) one more real bug, (2) one questionable item in the bug report, and (3) some cases where syntactically incorrect (or at least really wierd) regular expressions were handled differently. (1) I downloaded the GNU test suite, and discovered a bug when trying the Khadafy test. In a pattern like [-z], the dash should be interpreted as a dash rather than as a range operator (so the pattern matches either a dash or a z). This wasn't working, and it's fixed in the attached copy. Just for fun, this is the Khadafy test expression: M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy] ^ Used to fail here. | (2) One of the bugs mentioned is not really a bug (IMHO). Richard Goerwitz's comment: > The new routines do have a few equally new bugs. For instance, it seems > that ^ is interpreted as a metacharacter in the context [^-z], where the > precedence of the "-" should make it interpreted as a regular character. In a [...] construct, a ^ in the first position complements the following "cset", and a "-" in the front is taken as itself rather than in its meta-meaning as a range operator. So [^-z] parses to mean: any character but "-" or "z". This seems to be how UNIX egrep handles it. Mine does too, but it used to fail due to the bug mentioned above (1). (3) Some of the discrepancies are in handling of badly-formed regular expressions. Among GNU, UNIX, and mine, some errors are reported, and other errors are not reported but some interpretation is made of the erroneous construct. E.g., Richard G. writes: > The expression [^] should incite a syntax error, too (so SYSV egrep, and > also its GNU equivalent). I'm just as happy it doesn't, but I wonder how > it't being interpreted. My routine takes this as a literal '^', where GNU calls it an error. (Kind of the opposite of the reaction to "*a", where mine calls it an error but GNU doesn't). The remaining few discrepancies detected by the GNU test suite are of this nature (see the test results, below). Note that the discrepancies in UNIX egrep are also for wierd REs except the last which, it seems to me, should match. I'm not sure if anyone really cares about such minor differences concerning only incorrect regular expressions. My inclination is to leave the code as is. It could be changed to pass the GNU tests, but that would result in (probably) more code. My objective was to create useful, correct RE procedures, not really to create an *exact* duplicate of any existing product. I modified the GNU test driver to provide more information on failures. Here are the failures using the attached procedure: Status code legend: 0 - match; 1 - no match; 2 - syntax error GNU test results with igrep --------------------------- Test 52. Pattern: '*a'; String: '-'; Expected status: 1 Spencer test #52 failed: status: 2 Test 55. Pattern: '(*)b'; String: '-'; Expected status: 1 Spencer test #55 failed: status: 2 Test 57. Pattern: 'a\'; String: '-'; Expected status: 2 Spencer test #57 failed: status: 1 Test 62. Pattern: 'abc)'; String: '-'; Expected status: 2 Spencer test #62 failed: status: 1 GNU test results with UNIX sun4 egrep ------------------------------------- Test 50. Pattern: '()ef'; String: 'def'; Expected status: 0 Spencer test #50 failed: status: 2 Test 51. Pattern: '()*'; String: '-'; Expected status: 0 Spencer test #51 failed: status: 2 Test 52. Pattern: '*a'; String: '-'; Expected status: 1 Spencer test #52 failed: status: 2 Test 55. Pattern: '(*)b'; String: '-'; Expected status: 1 Spencer test #55 failed: status: 2 Test 56. Pattern: '$b'; String: 'b'; Expected status: 1 Spencer test #56 failed: status: 0 Test 71. Pattern: '(a|)*'; String: '-'; Expected status: 0 Spencer test #71 failed: status: 2 Test 78. Pattern: '(ab|)*'; String: '-'; Expected status: 0 Spencer test #78 failed: status: 2 Test 94. Pattern: '(abc|)ef'; String: 'abcdef'; Expected status: 0 Spencer test #94 failed: status: 2 Test 122. Pattern: '(....).*\1'; String: 'beriberi'; Expected status: 0 Spencer test #122 failed: status: 1 ############################################################################ # # Name: regexp.icn # # Title: UNIX-like Regular Expression Pattern Matching Procedures # # Author: Robert J. Alexander # # Date: November 8, 1991 # ############################################################################ # # This is a kit of procedures to deal with UNIX-like regular expression # patterns. # # These procedures are interesting partly because of the "recursive # suspension" (or "suspensive recursion" :-) technique used to simulate # conjunction of an arbitrary number of computed expressions (see # notes, below). # # # The public procedures are: # # ReMatch(pattern,s,i1,i2) : i3,i4,...,iN # ReFind(pattern,s,i1,i2) : i3,i4,...,iN # RePat(s) : pattern list # # # ReMatch() produces the sequence of positions in "s" past a substring # starting at "i1" that matches "pattern", but fails if there is no # such position. Similar to match(), but is capable of generating # multiple positions. # # ReFind() produces the sequence of positions in "s" where substrings # begin that match "pattern", but fails if there is no such position. # Similar to find(). Each position is produced only once, even if # several possible matches are possible at that position. # # "pattern" can be either a string or a pattern list -- see RePat(), # below. # # Default values of s, i1, and i2 are handled as for Icon's built-in # string scanning procedures such as match(). # # # RePat(s) : L # # Creates a pattern element list from pattern string "s", but fails if # the pattern string is not syntactically correct. ReMatch() and # ReFind() will automatically convert a pattern string to a pattern # list, but it is faster to do the conversion explicitly if multiple # operations are done using the same pattern. An additional advantage # to compiling the pattern separately is avoiding ambiguity of failure # caused by an incorrect pattern and failure to match a correct pattern. # # # Accessible Global Variables # # After a match, the strings matched by parenthesized regular # expressions are left in list "Re_ParenGroups", and can be accessed by # subscripting in using the same number as the \N construct. # # If it is desired that regular expression format be similar to UNIX # filename generation patterns but still retain the power of full # regular expressions, make the following assignments prior to # compiling the pattern string: # # Re_ArbString := "*" # Defaults to ".*" # Re_AnyString := "?" # Defaults to "." # # The sets of characters (csets) that define a word, digits, and white # space can be modified. The following assignments can be made before # compiling the pattern string. The character sets are captured when # the pattern is compiled, so changing them after pattern compilation # will not alter the behavior of matches unless the pattern string is # recompiled. # # Re_WordChars := 'whatever you like' # # Defaults to &letters ++ &digits ++ "_" # Re_Digits := &digits ++ 'ABCDEFabcdef' # # Defaults to &digits # Re_Space := 'whatever you like' # # Defaults to ' \t\v\n\r\f' # # These globals are normally not initialized until the first call to # RePat(), and then only if they are null. They can be explicitly # initialized to their defaults (if they are null) by calling # Re_Default(). # # Characters compiled into patterns can be passed through a # user-supplied filter procedure, provided in global variable # Re_Filter. The filtering is done before the characters are bound # into the pattern. The filter proc is passed one argument, the string # to filter, and it must return the filtered string as its result. If # the filter proc fails, the string will be used unfiltered. The # filter proc is called with an argument of either type string (for # characters in the pattern) or cset (for character classes [...]). A # typical use for this facility is to implement case-independent # matching. All pattern characters can downshifted by assigning # # Re_Filter := map # # Filtering is done only as the pattern is compiled. Filtering of # strings to be matched must be explicitly done. Therefore, # case-independent matching will occur only if map() is applied to all # strings to be matched. # # In the case of patterns containing alternation, ReFind() will # generally not produce positions in increasing order, but will produce # all positions from the first term of the alternation (in increasing # order) followed by all positions from the second (in increasing # order). If it is necessary that the positions be generated in # strictly increasing order, with no duplicates, assign any non-null # value to Re_Ordered: # # Re_Ordered := 1 # # If the Re_Ordered options is chosen, there is a *small* penalty in # efficiency in some cases, and the co-expression facility is required # in your Icon implementation. Example: # # # Regular Expression Characters and Features Supported # # The regular expression format supported by procedures in this file # model very closely those supported by the UNIX "egrep" program, with # modifications as in the Perl programming language definition. # Following is a brief description of the special characters used in # regular expressions. In the description, the abbreviation RE means # regular expression. # # c An ordinary character (not one of the special characters # discussed below) is a one-character RE that matches that # character. # # \c A backslash followed by any special character is a one- # character RE that matches the special character itself. # # Note that backslash escape sequences representing # non-graphic characters are not supported directly # by these procedures. Of course, strings coded in an # Icon program will have such escapes handled by the # Icon translator. If such escapes must be supported # in strings read from the run-time environment (e.g. # files), they will have to be converted by other means, # such as the Icon Program Library procedure "escape()". # # . A period is a one-character RE that matches any # character. # # [string] A non-empty string enclosed in square brackets is a one- # character RE that matches any *one* character of that # string. If, the first character is "^" (circumflex), # the RE matches any character not in the remaining # characters of the string. The "-" (minus), when between # two other characters, may be used to indicate a range of # consecutive ASCII characters (e.g. [0-9] is equivalent to # [0123456789]). Other special characters stand for # themselves in a bracketed string. # # * Matches zero or more occurrences of the RE to its left. # # + Matches one or more occurrences of the RE to its left. # # ? Matches zero or one occurrences of the RE to its left. # # {N} Matches exactly N occurrences of the RE to its left. # # {N,} Matches at least N occurrences of the RE to its left. # # {N,M} Matches at least N occurrences but at most M occurrences # of the RE to its left. # # ^ A caret at the beginning of an entire RE constrains # that RE to match an initial substring of the subject # string. # # $ A currency symbol at the end of an entire RE constrains # that RE to match a final substring of the subject string. # # | Alternation: two REs separated by "|" match either a # match for the first or a match for the second. # # () A RE enclosed in parentheses matches a match for the # regular expression (parenthesized groups are used # for grouping, and for accessing the matched string # subsequently in the match using the \N expression). # # \N Where N is a digit in the range 1-9, matches the same # string of characters as was matched by a parenthesized # RE to the left in the same RE. The sub-expression # specified is that beginning with the Nth occurrence # of "(" counting from the left. E.g., ^(.*)\1$ matches # a string consisting of two consecutive occurrences of # the same string. # # Perl Extensions # # The following extensions to UNIX REs, as specified in the Perl # programming language, are supported. # # \w Matches any alphanumeric (including "_"). # \W Matches any non-alphanumeric. # # \b Matches only at a word-boundary (word defined as a string # of alphanumerics as in \w). # \B Matches only non-word-boundaries. # # \s Matches any white-space character. # \S Matches any non-white-space character. # # \d Matches any digit [0-9]. # \D Matches any non-digit. # # \w, \W, \s, \S, \d, \D can be used within [string] REs. # # # Note on Details of Matching # # The method of matching differs a bit from UNIX-style regular # expressions -- particularly where closures are concerned ("*", "+", # "{}", "?"). UNIX regular expressions are documented to match the # "longest, leftmost" strings in cases where a choice is needed. The # procedures in this file are capable of generating all possible # matches of the pattern, and generate the possibilities by matching # the fewest first ("shortest, leftmost"). Matching of the various # pattern elements is performed exactly as though it were an Icon # conjunction of the pattern elements. # # # Notes on computed conjunction expressions by "suspensive recursion" # # A conjunction expression of an arbitrary number of terms can be # computed in a looping fashion by the following recursive technique: # # procedure Conjunct(v) # if then # suspend Conjunct() # else # suspend v # end # # The argument "v" is needed for producing the value of the last term # as the value of the conjunction expression, accurately modeling Icon # conjunction. If the value of the conjunction is not needed, the # technique can be slightly simplified by eliminating "v": # # procedure ConjunctAndProduceNull() # if then # suspend ConjunctAndProduceNull() # else # suspend # end # # Note that must still remain in the suspend # expression to test for failure of the term, although its value is not # passed to the recursive invocation, This could have been coded as # # suspend & ConjunctAndProduceNull() # # but wouldn't have been as provocative. # # Since the computed conjunctions in this program are evaluated only for # their side effects, the second technique is used in two situations: # # (1) To compute the conjunction of all of the elements in the # regular expression pattern list (Re_match1()). # # (2) To evaluate the "exactly N times" and "N to M times" # control operations (Re_NTimes()). # record Re_Tok(proc,args) global Re_ParenGroups,Re_Filter,Re_Ordered global Re_WordChars,Re_NonWordChars global Re_Space,Re_NonSpace global Re_Digits,Re_NonDigits global Re_ArbString,Re_AnyString global Re_TabMatch ################### Pattern Translation Procedures ################### procedure RePat(s) # L # # Produce pattern list representing pattern string s. # # # Create a list of pattern elements. Pattern strings are parsed # and converted into list elements as shown in the following table. # Since some list elements reference other pattern lists, the # structure is really a tree. # # Token Generates Matches... # ----- --------- ---------- # ^ Re_Tok(pos,[1]) Start of string or line # $ Re_Tok(pos,[0]) End of string or line # . Re_Tok(move,[1]) Any single character # + Re_Tok(Re_OneOrMore,[tok]) At least one occurrence of # previous token # * Re_Tok(Re_ArbNo,[tok]) Zero or more occurrences of # previous token # | Re_Tok(Re_Alt,[pattern,pattern]) Either of prior expression # or next expression # [...] Re_Tok(Re_TabAny,[cset]) Any single character in # specified set (see below) # (...) Re_Tok(Re_MatchReg,[pattern]) Parenthesized pattern as # single token # The string of no-special # Re_Tok(Re+TabMatch,string) characters # \b Re_Tok(Re_WordBoundary,[Re_WordChars,Re_NonWordChars]) # A word-boundary # (word default: [A-Za-z0-9_]+) # \B Re_Tok(Re_NonWordBoundary,[Re_WordChars,Re_NonWordChars]) # A non-word-boundary # \w Re_Tok(Re_TabAny,[Re_WordChars])A word-character # \W Re_Tok(Re_TabAny,[Re_NonWordChars]) A non-word-character # \s Re_Tok(Re_TabAny,[Re_Space]) A space-character # \S Re_Tok(Re_TabAny,[Re_NonSpace]) A non-space-character # \d Re_Tok(Re_TabAny,[Re_Digits]) A digit # \D Re_Tok(Re_TabAny,[Re_NonDigits]) A non-digit # {n,m} Re_Tok(Re_NToMTimes,[tok,n,m]) n to m occurrences of # previous token # {n,} Re_Tok(Re_NOrMoreTimes,[tok,n]) n or more occurrences of # previous token # {n} Re_Tok(Re_NTimes,[tok,n]) exactly n occurrences of # previous token # ? Re_Tok(Re_ZeroOrOneTimes,[tok]) one or zero occurrences of # previous token # \ Re_Tok(Re_MatchParenGroup,[n]) The string matched by # parenthesis group # local plist # # Initialize. # initial Re_Default() Re_WordChars := cset(Re_WordChars) Re_NonWordChars := ~Re_WordChars Re_Space := cset(Re_Space) Re_NonSpace := ~Re_Space Re_Digits := cset(Re_Digits) Re_NonDigits := ~Re_Digits s ? (plist := Re_pat1(0)) | fail return plist end procedure Re_pat1(level) # L # # Recursive portion of RePat() # local plist,n,m,c,comma static none,parenNbr initial { Re_TabMatch := proc("=",1) none := [] } if level = 0 then parenNbr := 0 plist := [] # # Loop to put pattern elements on list. # until pos(0) do { (="|",plist := [Re_Tok(Re_Alt,[plist,Re_pat1(level + 1) | fail])]) | put(plist, ## (="^",*plist = 0 | plist[-1].proc === Re_Alt,Re_Tok(pos,[1])) | (="^",pos(2) | &subject[-2] == ("|" | "("),Re_Tok(pos,[1])) | (="$",pos(0) | match("|" | ")"),Re_Tok(pos,[0])) | (match(")"),level > 0,break) | (=Re_ArbString,Re_Tok(Re_Arb,none)) | (=Re_AnyString,Re_Tok(move,[1])) | (="+",Re_Tok(Re_OneOrMore,[Re_prevTok(plist) | fail])) | (="*",Re_Tok(Re_ArbNo,[Re_prevTok(plist) | fail])) | 1(Re_Tok(Re_TabAny,[c := Re_cset()]),\c | fail) | 3(="(",n := parenNbr +:= 1, Re_Tok(Re_MatchReg,[Re_pat1(level + 1) | fail,n]), move(1) | fail) | (="\\b",Re_Tok(Re_WordBoundary,[Re_WordChars,Re_NonWordChars])) | (="\\B",Re_Tok(Re_NonWordBoundary,[Re_WordChars,Re_NonWordChars])) | (="\\w",Re_Tok(Re_TabAny,[Re_WordChars])) | (="\\W",Re_Tok(Re_TabAny,[Re_NonWordChars])) | (="\\s",Re_Tok(Re_TabAny,[Re_Space])) | (="\\S",Re_Tok(Re_TabAny,[Re_NonSpace])) | (="\\d",Re_Tok(Re_TabAny,[Re_Digits])) | (="\\D",Re_Tok(Re_TabAny,[Re_NonDigits])) | (="{",(n := tab(many(&digits)),comma := =(",") | &null, m := tab(many(&digits)) | &null,="}") | fail, if \m then Re_Tok(Re_NToMTimes, [Re_prevTok(plist),integer(n),integer(m)]) else if \comma then Re_Tok(Re_NOrMoreTimes, [Re_prevTok(plist),integer(n)]) else Re_Tok(Re_NTimes,[Re_prevTok(plist),integer(n)])) | (="?",Re_Tok(Re_ZeroOrOneTimes,[Re_prevTok(plist) | fail])) | Re_Tok(Re_TabMatch,[Re_string(level)]) | (="\\",n := tab(any(&digits)),Re_Tok(Re_MatchParenGroup,[integer(n)])) ) | fail } return plist end procedure Re_prevTok(plist) # # Pull previous token from the pattern list. This procedure must take # into account the fact that successive character tokens have been # optimized into a single string token. # local lastTok,s,r lastTok := pull(plist) | fail if lastTok.proc === Re_TabMatch then { s := lastTok.args[1] r := Re_Tok(Re_TabMatch,[s[-1]]) s[-1] := "" if *s > 0 then { put(plist,lastTok) lastTok.args[1] := s } return r } return lastTok end procedure Re_Default() # # Assign default values to regular expression translation globals, but # only to variables whose values are null. # /Re_WordChars := &letters ++ &digits ++ "_" /Re_Space := ' \t\v\n\r\f' /Re_Digits := &digits /Re_ArbString := ".*" /Re_AnyString := "." return end procedure Re_cset() # # Matches a [...] construct and returns a cset. # local complement,c,e,ch,chars ="[" | fail (complement := ="^" | &null, (e := (="-" | "")) || (c := move(1) || tab(find("]"))),move(1)) | return &null c ? { while chars := tab(upto('-\\')) do { e ++:= case move(1) of { "-": chars[1:-1] ++ &cset[ord(chars[-1]) + 1:ord(move(1)) + 2] | return &null "\\": case ch := move(1) of { "w": Re_WordChars "W": Re_NonWordChars "s": Re_Space "S": Re_NonSpace "d": Re_Digits "D": Re_NonDigits default: ch } } } e ++:= tab(0) if \complement then e := ~e } e := (\Re_Filter)(e) return cset(e) end procedure Re_string(level) # # Matches a string of non-special characters, returning a string. # local special,s,p static nondigits initial nondigits := ~&digits special := if level = 0 then '\\.+*|[({?' else '\\.+*|[({?)' s := tab(upto(special) | 0) while ="\\" do { p := &pos if tab(any('wWbBsSdD')) | (tab(any('123456789')) & (pos(0) | any(nondigits))) then { tab(p - 1) break } s ||:= move(1) || tab(upto(special) | 0) } if pos(0) & s[-1] == "$" then { move(-1) s[-1] := "" } s := string((\Re_Filter)(s)) return "" ~== s end ##################### Matching Engine Procedures ######################## procedure ReMatch(plist,s,i1,i2) # i3,i4,...,iN # # Produce the sequence of positions in s past a string starting at i1 # that matches the pattern plist, but fails if there is no such # position. Similar to match(), but is capable of generating multiple # positions. # if type(plist) ~== "list" then plist := RePat(plist) | fail /i1:= if /s := &subject then &pos else 1 ; /i2 := 0 Re_ParenGroups := [] suspend s[i1:i2] ? (Re_match1(plist,1),i1 + &pos - 1) end procedure Re_match1(plist,i) # s1,s2,...,sN # # Used privately by ReMatch() to simulate a computed conjunction # expression via recursive generation. # local tok suspend if tok := plist[i] then Re_tok_match(tok,plist,i) & Re_match1(plist,i + 1) else &null end procedure ReFind(plist,s,i1,i2) # i3,i4,...,iN # # Produce the sequence of positions in s where strings begin that match # the pattern plist, but fails if there is no such position. Similar # to find(). # local p if type(plist) ~== "list" then plist := RePat(plist) | fail /i1 := if /s := &subject then &pos else 1 ; /i2 := 0 s[i1:i2] ? suspend ( tab(Re_skip(plist,1)) & p := &pos & Re_match1(plist,1)\1 & i1 + p - 1) end procedure Re_tok_match(tok,plist,i) # # Match a single token. Can be recursively called by the token # procedure. # local prc prc := tok.proc suspend ( if prc === Re_Arb then Re_Arb(plist,i) else suspend prc!tok.args ) end ########## Heuristic Code for Matching Arbitrary Characters ########## procedure Re_skip(plist,i) # s1,s2,...,sN # # Used privately -- match a sequence of strings in s past which a match # of the first pattern element in plist is likely to succeed. This # procedure is used for heuristic performance improvement by ReMatch() # for the ".*" pattern element, and by ReFind(). # local x,s,p,prc x := plist[i] suspend case prc := (\x).proc | &null of { Re_TabMatch: find!x.args Re_TabAny: upto!x.args pos: x.args[1] ## Re_WordBoundary: Re_WordBoundaries!x.args Re_WordBoundary | Re_NonWordBoundary: p := &pos & tab(Re_skip(plist,i + 1)) & prc!x.args & untab(p) Re_OneOrMore | Re_MatchParenGroup: if s := (\Re_ParenGroups)[x.args[1]] then find(s) else &pos to *&subject + 1 Re_NToMTimes | Re_NOrMoreTimes | Re_NTimes: if x.args[2] > 0 then Re_skip(x.args[1],1) else &pos to &subject + 1 Re_MatchReg: Re_skip(x.args[1],1) Re_Alt: if \Re_Ordered then Re_result_merge{Re_skip(x.args[1],1),Re_skip(x.args[2],1)} else Re_skip(x.args[1 | 2],1) default: &pos to *&subject + 1 } end procedure Re_result_merge(L) # # Programmer-defined control operation to merge the result sequences of # two integer-producing generators. Both generators must produce their # result sequences in numerically increasing order with no duplicates, # and the output sequence will be in increasing order with no # duplicates. # local e1,e2,r1,r2 e1 := L[1] ; e2 := L[2] r1 := @e1 ; r2 := @e2 while \(r1 | r2) do if /r2 | \r1 < r2 then suspend r1 do r1 := @e1 | &null else if /r1 | r1 > r2 then suspend r2 do r2 := @e2 | &null else r2 := @e2 | &null end procedure untab(origPos) # # Converts a string scanning expression that moves the cursor to one # that produces a cursor position and doesn't move the cursor (converts # something like tab(find(x)) to find(x). The template for using this # procedure is # # origPos := &pos ; tab(x) & ... & untab(origPos) # local newPos newPos := &pos tab(origPos) suspend newPos tab(newPos) end ####################### Matching Procedures ####################### procedure Re_Arb(plist,i) # # Match arbitrary characters (.*) # suspend tab(if \plist then Re_skip(plist,i + 1) else 1 to *&subject + 1) end procedure Re_TabAny(C) # # Match a character of a character set ([...],\w,\W,\s,\S,\d,\D #) suspend tab(any(C)) end procedure Re_MatchReg(tokList,groupNbr) # # Match parenthesized group and assign matched string to list Re_ParenGroup # local p,s p := &pos /Re_ParenGroups := [] every Re_match1(tokList,1) do { while *Re_ParenGroups < groupNbr do put(Re_ParenGroups) s := &subject[p:&pos] Re_ParenGroups[groupNbr] := s suspend s } Re_ParenGroups[groupNbr] := &null end procedure Re_WordBoundary(wd,nonwd) # # Match word-boundary (\b) # suspend ((pos(1),any(wd)) | (pos(0),move(-1),tab(any(wd))) | (move(-1), (tab(any(wd)),any(nonwd)) | (tab(any(nonwd)),any(wd))),"") end procedure Re_NonWordBoundary(wd,nonwd) # # Match non-word-boundary (\B) # suspend ((pos(1),any(nonwd)) | (pos(0),move(-1),tab(any(nonwd))) | (move(-1), (tab(any(wd)),any(wd)) | (tab(any(nonwd)),any(nonwd)),"")) end procedure Re_MatchParenGroup(n) # # Match same string matched by previous parenthesized group (\N) # local s suspend if s := \Re_ParenGroups[n] then =s else "" end ################### Control Operation Procedures ################### procedure Re_ArbNo(tok) # # Match any number of times (*) # suspend "" | (Re_tok_match(tok) & Re_ArbNo(tok)) end procedure Re_OneOrMore(tok) # # Match one or more times (+) # suspend Re_tok_match(tok) & Re_ArbNo(tok) end procedure Re_NToMTimes(tok,n,m) # # Match n to m times ({n,m} # suspend Re_NTimes(tok,n) & Re_ArbNo(tok)\(m - n + 1) end procedure Re_NOrMoreTimes(tok,n) # # Match n or more times ({n,}) # suspend Re_NTimes(tok,n) & Re_ArbNo(tok) end procedure Re_NTimes(tok,n) # # Match exactly n times ({n}) # if n > 0 then suspend Re_tok_match(tok) & Re_NTimes(tok,n - 1) else suspend end procedure Re_ZeroOrOneTimes(tok) # # Match zero or one times (?) # suspend "" | Re_tok_match(tok) end procedure Re_Alt(tokList1,tokList2) # # Alternation (|) # suspend Re_match1(tokList1 | tokList2,1) end From icon-group-request@arizona.edu Sat Nov 9 21:01:27 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 9 Nov 91 21:01:27 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA16450; Sat, 9 Nov 91 21:01:25 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sat, 9 Nov 1991 21:00 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA15129; Sat, 9 Nov 91 19:47:49 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sat, 9 Nov 1991 21:01 MST Date: 9 Nov 91 23:26:36 GMT From: ogicse!milton!nntp.uoregon.edu!euclid!haertel@uunet.uu.net (Mike Haertel) Subject: RE: Regular expression timings Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <5FE89D39766006B8@Arizona.edu> Message-Id: <1991Nov9.232636.14560@nntp.uoregon.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: Department of Mathematics, University of Oregon References: <1580@cronos.metaphor.com> In article <1580@cronos.metaphor.com> alex@laguna.metaphor.com (Bob Alexander) writes: >Here are some timings done comparing regular expression routines in >Icon and otherwise. The timings are the result of the C shell "time" >command applied to the GNU regression test set -- two trials each. These timings are essentially meaningless for grep performance. I'd be much more interested to see the timings for situations dominated by search time, rather than regexp compile time. The algorithms for regular expressions are dominated by data structures like sets and graphs, and operations like transitive closure; these are operations that Icon does very well. But regexp compile time doesn't matter all that much. Most people use grep to quickly search through a large amount of data looking for a single pattern. So search time is what dominates grep usage. Once an automaton is built, regexp search is dominated by array indexing time (by character codes). This is something that C does very well. In the search phase, the GNU egrep regexp matcher (dfa.c, not regex.c) averages fewer than 10 machine instructions per character (on the 68020), and it never backtracks. Also note that the search phase of GNU egrep is prefiltered by a Boyer-Moore scan, and so in actual practice for typical patterns it probably averages less than 1 instruction per character of the input file, since it need not look at every character. I really seriously doubt that any matcher written in Icon can even begin to touch performance like this. From wgg@cs.ucsd.edu Mon Nov 11 15:35:51 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 11 Nov 91 15:35:51 MST Received: from ucsd.edu by optima.cs.arizona.edu (4.1/15) id AA01940; Mon, 11 Nov 91 15:35:30 MST Received: from gremlin.ucsd.edu by ucsd.edu; id AA10504 sendmail 5.64/UCSD-2.2-sun via SMTP Mon, 11 Nov 91 14:35:26 -0800 for icon-group@cs.arizona.edu Received: by gremlin.ucsd.edu (4.1/UCSDPSEUDO.4) id AA22258 for icon-group@cs.arizona.edu; Mon, 11 Nov 91 14:35:24 PST Date: Mon, 11 Nov 91 14:35:24 PST From: wgg@cs.ucsd.edu (William Griswold) Message-Id: <9111112235.AA22258@gremlin.ucsd.edu> To: icon-group@cs.arizona.edu Subject: writing sorting routines in Icon Recently I was trying to implement a priority queue data type with a `heap' data structure. This type would allow me to select the k greatest elements in a list in sorted order--the ``k most common words'' problem. It can be easily solved by building a table of word counts and sorting the whole table, but with big tables this is a big loser, since sorting is N log N for list of length N. With a priority queue I can select the k greatest elements in N log k time (or likely much less), since each p-queue operation takes log time, but there needn't be more than k elements in the queue at any time. The priority queue is basically a sorting data structure. However, I was inhibited from implementing a queue that could handle elements from all types the way Icon sort does, because I need a less-than operation that can compare values from any type. Icon directly supports only < (for integers) and << (for strings). Oddly, there is a generic equals operator, ===, but no <<<. How did I solve this problem? Here is how I implemented generic_lt (a generic less-than): procedure generic_lt(x,y) return (x ~=== y) & (sort([x,y])[2] === y) end Not very pretty, but it will work (no, I haven't tested it yet). This cannot be very efficient, but it will always adhere to the very important constraint that sorting with my priority queues will work just like Icon's built-in sort. Does anyone see a way of implementing this *without* sort? Icon has the peculiar definition of sorting things like records by ``age''. In the current implementation I can get an objects x's age by computing image(x) and then lifting the object number out of the text string, but that is very unportable, probably. I think calling sort is the only reasonable way to build my own Icon-compatible sorting-class routines. My priority-queue solution to `k most common words' (using integer comparison for ordering) is roughly twice as fast my old straight-sorting solution for computing the 10 most common words on a 2 1/2 meg file, so wanting to implement my own sorting-class routines is not unwarranted. In my opinion, since implementing these routines--and wanting them to be efficient and consistent with Icon sort--is not unwarranted, the bottom line is that Icon has a design flaw because it is missing <<<, >>>, <<<= and >>>=, or because Icon sort can sort things in a way that a programmer cannot compare them. Comments? Bill Griswold From ralph Mon Nov 11 18:06:52 1991 Date: Mon, 11 Nov 91 18:06:52 MST From: "Ralph Griswold" Message-Id: <9111120106.AA13433@cheltenham.cs.arizona.edu> Received: by cheltenham.cs.arizona.edu; Mon, 11 Nov 91 18:06:52 MST To: icon-group@cs.arizona.edu, wgg@cs.ucsd.edu Subject: Re: writing sorting routines in Icon Icon has a lot of design flaws. One is having too many functions and operators. We considered structure comparison operators but did not implement them because we felt the additional linguisitic litter overshadowed the additional functionality. Whether you agree depends on how you feel about language design. Ralph Griswold / Department of Computer Science The University of Arizona / Tucson, AZ 85721 ralph@cs.arizona.edu / uunet!arizona!ralph voice: 602-621-6609 / fax: 602-621-9618 From icon-group-request@arizona.edu Wed Nov 13 19:34:36 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 13 Nov 91 19:34:36 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA08770; Wed, 13 Nov 91 19:02:56 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Wed, 13 Nov 1991 19:02 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA11088; Wed, 13 Nov 91 17:44:19 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Wed, 13 Nov 1991 19:02 MST Date: 13 Nov 91 19:42:10 GMT From: micro-heart-of-gold.mit.edu!wupost!spool.mu.edu!uwm.edu!mrsvr.UUCP!hewitt@bloom-beacon.mit.edu (Anthony V. Hewitt) Subject: RE: list manipulation Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <740596FDD4602D12@Arizona.edu> Message-Id: X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: GE Medical Systems, Milwaukee, WI References: <1991Nov12.114242.2062@arizona.edu> It's my understanding that in Icon, even more than in most languages, a concern for efficiency can be counterproductive; it's not at all obvious which version of your code will run fastest, even if you understand the language implementation (which I don't). Explicit subscripts just don't seem idiomatic in Icon; you might not expect them to have as polished an implementation as more interesting language features. But are they slower than the alternatives? Try it and see! The queue and stack access methods provided by put(L,x), push(L,x), pop(L), get(L), pull(L) are efficient, so I'm told, and I have found them very useful. Steve Wampler published some time ago a comment about a student's program that used pop() to advantage when the list needed to be accessed only once. A construction like while write(pop(L)) is quite handy. -- ------------------------------------------------------------------------------ Anthony V. Hewitt Phone 414 548-5170 (GE Dialcom 8*320-5170) Senior Physicist General Electric Company Fax 414 548-5197 (8*320-5197) P.O.Box 414, W-641 Milwaukee, WI 53201 Internet: hewitta@aslpet.med.ge.com ------------------------------------------------------------------------------ From icon-group-request@arizona.edu Fri Nov 15 06:00:49 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 15 Nov 91 06:00:49 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA21932; Fri, 15 Nov 91 06:00:47 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Fri, 15 Nov 1991 06:00 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA25040; Fri, 15 Nov 91 04:45:53 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Fri, 15 Nov 1991 06:00 MST Date: 15 Nov 91 00:36:17 GMT From: elroy.jpl.nasa.gov!swrinde!mips!news.cs.indiana.edu!arizona.edu!arizona.edu!news@ames.arc.nasa.gov (Kurt Parten) Subject: Icon memory management Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <9916E78D58604738@Arizona.edu> Message-Id: <1991Nov14.173618.2088@arizona.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: Dept of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona Does anyone out there know details of Icon's memory management? Is there some sort of garbage collection which goes on? Do I have to worry about running out of heap? Is there a compiler option to specify heapsize? Is heapsize set by the compiler? Is heapsize whatever memory will allow? Thanks, Kurt -- Kurt Parten parten@helios.ece.arizona.edu From icon-group-request@arizona.edu Fri Nov 15 06:45:37 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 15 Nov 91 06:45:37 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA23459; Fri, 15 Nov 91 06:45:35 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Fri, 15 Nov 1991 06:45 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA26408; Fri, 15 Nov 91 05:41:35 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Fri, 15 Nov 1991 06:45 MST Date: 15 Nov 91 02:47:45 GMT From: news@arizona.edu (Kurt Parten) Subject: list manipulation Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <9F5931195860452B@Arizona.edu> Message-Id: <1991Nov14.194746.2091@arizona.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: Dept of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona Thanks to all who replied to the prev. post about list manipulation. Replacing occurences of somelist |||:= [item] with put(somelist, item) changed execution time from 11.7 secs to 9.4 secs. Roughly 20% shaved off. And there was no noticable savings between put(somelist, item) and push(somelist, item) Has anyone done anything else profiling Icon? -- Kurt Parten parten@helios.ece.arizona.edu From hbair@austin.onu.edu Sat Nov 16 23:46:03 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 16 Nov 91 23:46:03 MST Resent-From: hbair@austin.onu.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA19662; Sat, 16 Nov 91 23:46:01 MST Received: from austin.onu.edu by Arizona.edu with PMDF#10282; Sat, 16 Nov 1991 23:45 MST Received: by austin.onu.edu (AIX 3.1/UCB 5.61/4.03) id AA33346; Sun, 17 Nov 91 01:45:21 -0500 Resent-Date: Sat, 16 Nov 1991 23:45 MST Date: Sun, 17 Nov 91 01:45:21 -0500 From: hbair@austin.onu.edu (Heath Bair (x1666)) Subject: Mail Resent-To: icon-group@cs.arizona.edu To: Icon-group@arizona.edu Resent-Message-Id: Message-Id: <9111170645.AA33346@austin.onu.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: Icon-group@Arizona.edu I would like to be apart of your mail group. Heath C. Bair From icon-group-request@arizona.edu Sun Nov 17 11:06:06 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 17 Nov 91 11:06:06 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Hopey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA08153; Sun, 17 Nov 91 11:06:04 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sun, 17 Nov 1991 11:05 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA11920; Sun, 17 Nov 91 09:52:08 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sun, 17 Nov 1991 11:05 MST Date: 17 Nov 91 17:05:54 GMT From: agate!spool.mu.edu!samsung!zaphod.mps.ohio-state.edu!uwm.edu!ux1.cso.uiuc.edu!uchinews!ellis!goer@ucbvax.berkeley.edu (Richard L. Goerwitz) Subject: global variables, builtins Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <5612272BE86044A5@Arizona.edu> Message-Id: <1991Nov17.170554.17717@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago Computing Organizations Can someone explain to me why the following only works so long as I comment out the ", write" part? This behavior seems to break Jerry Nowlin's solit.icn program in the IPL ("global kbhit"). I was not aware that global variable names designating functions were different from globals designating user-defined procedures. My Icon run-time system passed all the tests, and I tested this both at home and on a Sun4. Something subtle is going on that I don't understand. global mywrite#, write procedure main() write :=: mywrite write("hello") end procedure mywrite(a[]) return mywrite!a end -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From isidev!nowlin@uunet.uu.net Sun Nov 17 14:52:19 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 17 Nov 91 14:52:19 MST Received: from relay1.UU.NET by optima.cs.arizona.edu (4.1/15) id AA17237; Sun, 17 Nov 91 14:52:16 MST Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA25943; Sun, 17 Nov 91 16:52:11 -0500 Date: Sun, 17 Nov 91 16:52:11 -0500 From: isidev!nowlin@uunet.uu.net Message-Id: <9111172152.AA25943@relay1.UU.NET> Received: from isidev.UUCP by uunet.uu.net with UUCP/RMAIL (queueing-rmail) id 165110.22006; Sun, 17 Nov 1991 16:51:10 EST To: uunet!cs.arizona.edu!icon-group@uunet.uu.net Subject: Re: global variables > Can someone explain to me why the following only works so long as I > comment out the ", write" part? This behavior seems to break Jerry > Nowlin's solit.icn program in the IPL ("global kbhit"). I was not > aware that global variable names designating functions were different > from globals designating user-defined procedures. > ... > global mywrite#, write The problem is that declaring a global with the same name as a built-in function replaces the built-in function even before anything is assigned to the name. By the time you used the exchange values operator to swap the original write() with mywrite() it was too late. The global value for write was already &null. > write :=: mywrite ISIcon gives warnings about this. I find them very useful. When the program included in the original posting was translated with ISIcon the following message was generated: Warning, "write": global declaration overrides built-in function The program still works just like PD Icon, since this is a warning, but you are cautioned that you may have done something you will regret. In this case a handy warning. I don't know exactly what was meant by "break ... solit.icn program in the IPL". The only built-in function that was redefined in solit.icn was display() and that didn't break anything. The kbhit() function, which is only implemented for DOS in PD Icon, isn't used in solit.icn. --- --- | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!nowlin --- --- From icon-group-request@arizona.edu Sun Nov 17 17:38:12 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 17 Nov 91 17:38:12 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA22490; Sun, 17 Nov 91 17:38:09 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Sun, 17 Nov 1991 17:37 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA21016; Sun, 17 Nov 91 16:25:25 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Sun, 17 Nov 1991 17:37 MST Date: 17 Nov 91 23:57:13 GMT From: cis.ohio-state.edu!zaphod.mps.ohio-state.edu!qt.cs.utexas.edu!cs.utexas.edu!uwm.edu!ux1.cso.uiuc.edu!uchinews!ellis!goer@ucbvax.berkeley.edu (Richard L. Goerwitz) Subject: RE: global variables Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <8CD60C0958604FC3@Arizona.edu> Message-Id: <1991Nov17.235713.27108@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago Computing Organizations References: <9111172152.AA25943@relay1.UU.NET> In article <9111172152.AA25943@relay1.UU.NET> nowlin@isidev.UUCP writes: > > > Can someone explain to me why [declaring a variable with the same > > name as a builtin function nulls the variable]? > >The problem is that declaring a global with the same name as a built-in >function replaces the built-in function even before anything is assigned to >the name. Obviously what you say describes what I observe. I wish the behavior was the same for builtins as for user-defined procedures, though. It's clearly trip- ped up more than one person. Not you, of course. But take a look at this: -------------------------- from solit.icn (IPL) ------------------------ # Name: solit.icn global whitespace, amode, seed, deck, over, hidden, run, ace, kbhit if not(&features == "keyboard functions") then kbhit := 2 if proc(kbhit) then writes( " [Press any key to return.]") ----------------------------- end IPL material --------------------------- Looks to me like somebody couldn't decide whether to test for the presence of kbhit() via &features or proc(). The second method is better, since it detects kbhit() even if it's added as a library function. The global defi- nition kbhit := 2 nulls out the expression kbhit() if there is no builtin kbhit function. For those who are confused here, the solit game is supposed to auto-detect the presence of the kbhit utility in the run-time system. If an Icon imple- mentation has kbhit(), getch(), and getche(), then the following expression will succeed: find("keyboard func", &features). But what if you have added kbhit() via Icon code? I don't see an easy way to do this (see the file getchlib.icn in the IPL). But if somebody does, proc(kbhit) will succeed. So proc(kbhit) is a better way of checking for the presence of this function in the run-time system or as a user-defined procedure. If proc(kbhit) fails, you can null out the procedure by assigning the integer 2 to kbhit. That way kbhit() will evaluate to 2(), which is essentially a no-op, and will always fail. The idea is this: If the user has a kbhit() function (either builtin or via Icon code), then use it; otherwise, null out the function kbhit, and make kbhit() a no-op. The problem is that, if kbhit() isn't a function or user-defined procedure, kbhit() will only work "right" if a) kbhit is assigned an integer value, and b) gets assigned the value 2 every time the variable is encountered and proc(kbhit) fails. Making kbhit global doesn't work for the reasons Jerry discussed. >ISIcon gives warnings about this. I find them very useful. When the >program included in the original posting was translated with ISIcon the >following message was generated: > > Warning, "write": global declaration overrides built-in function You wouldn't happen to have a connection to ISI, would you :-). Tell you what: I'll overlook the commercialism if you'll check to see if this SYSV code works correctly on your machines. I run Xenix, as you know, and have rdchk(). You guys should, too (SYSVr3 or later). I'm curious about how these routines would work with ISI's windowing enabled. Just append them to src/iconx/fsys.c and add KeyboardFncs to the defines.h file: /* #include - included by ../h/config.h */ #include #include #include #define CopyTty(t1,t2) if (! reset_flag) {\ t2 = t1;\ reset_flag = 1;\ } /* * kbhit(): word * * Routine to check for the availability of characters on the stdin * stream. Does not actually read any characters. Returns nonzero * value if characters are waiting; otherwise, zero. */ word kbhit() { struct termio tty, new_tty; register word status, reset_flag = 0; extern word errno; if (isatty(0)) { if (ioctl(0, TCGETA, &tty) == -1) RunErr(-214, NULL); if (tty.c_lflag & ICANON) { CopyTty(tty, new_tty); new_tty.c_lflag &= ~ICANON; if (ioctl(0, TCSETA, &new_tty) == -1) RunErr(-214, NULL); } } /* see if anything is waiting to be read from the file */ status = rdchk(0); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } if (reset_flag) { if (ioctl(0, TCSETA, &tty) == -1) RunErr(-214, NULL); } return status; } /* * getch(): word * * Routine to read one char from the standard input. Disables canonical * processing so as to read the character right from the buffer, without * having to wait for a carriage return. Re-enables canonical processing * after reading a character. Note that getch() does not echo any char- * acters to the screen, although characters might appear on the screen * anyway, if they were typed before getch() was invoked. */ word getch() { char c; struct termio tty, new_tty; register word status, reset_flag = 0; extern word errno; if (isatty(0)) { if (ioctl(0, TCGETA, &tty) == -1) RunErr(-214, NULL); if (tty.c_lflag & ICANON) { CopyTty(tty, new_tty); /* sets reset_flag */ new_tty.c_lflag &= ~ICANON; } if (tty.c_cc[VMIN] != '\1') { CopyTty(tty, new_tty); new_tty.c_cc[VMIN] = (unsigned char )'\1'; } if (tty.c_cc[VTIME]) { CopyTty(tty, new_tty); new_tty.c_cc[VTIME] = (unsigned char )'\0'; } if (tty.c_lflag & ECHO) { CopyTty(tty, new_tty); new_tty.c_lflag &= ~ECHO; } if (reset_flag) { if (ioctl(0, TCSETA, &new_tty) == -1) RunErr(-214, NULL); } } status = read(0, &c, 1); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } if (reset_flag) { if (ioctl(0, TCSETA, &tty) == -1) RunErr(-214, NULL); } if (! status) return -1; else return (word )c; } /* * getche(): word * * Routine to read one char from the standard input. Disables canonical * processing so as to read the character right from the buffer, without * having to wait for a carriage return. Re-enables canonical processing * after reading a character. NOTE: Getche() does not disable echoing, * so anything typed after it is invoked will be echoed to the screen * (unlike getch(), which disables echoing; see above). */ word getche() { char c; struct termio tty, new_tty; register word status, reset_flag = 0; extern word errno; if (isatty(0)) { if (ioctl(0, TCGETA, &tty) == -1) RunErr(-214, NULL); if (tty.c_lflag & ICANON) { CopyTty(tty, new_tty); /* sets reset_flag */ new_tty.c_lflag &= ~ICANON; } if (tty.c_cc[VMIN] != '\1') { CopyTty(tty, new_tty); new_tty.c_cc[VMIN] = (unsigned char )'\1'; } if (tty.c_cc[VTIME]) { CopyTty(tty, new_tty); new_tty.c_cc[VTIME] = (unsigned char )'\0'; } if (! (tty.c_lflag & ECHO)) { CopyTty(tty, new_tty); new_tty.c_lflag |= ECHO; } if (reset_flag) { if (ioctl(0, TCSETA, &new_tty) == -1) RunErr(-214, NULL); } } status = read(0, &c, 1); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } if (reset_flag) { if (ioctl(0, TCSETA, &tty) == -1) RunErr(-214, NULL); } if (! status) return -1; else return (word )c; } -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request@arizona.edu Mon Nov 18 07:38:28 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 18 Nov 91 07:38:28 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Maggie.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA17968; Mon, 18 Nov 91 07:38:26 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Mon, 18 Nov 1991 07:37 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA15694; Mon, 18 Nov 91 06:35:20 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Mon, 18 Nov 1991 07:38 MST Date: 13 Nov 91 04:57:36 GMT From: cis.ohio-state.edu!zaphod.mps.ohio-state.edu!mips!spool.mu.edu!cs.umn.edu!umn.edu!msi.umn.edu!noc.MR.NET!uc.msc.edu!shamash!uchinews!ellis!goer@ucbvax.berkeley.edu (Richard L. Goerwitz) Subject: RE: list manipulation Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <02381777E860638F@Arizona.edu> Message-Id: <1991Nov13.045736.857@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago Computing Organizations References: <1991Nov12.114242.2062@arizona.edu> Kurt Parten writes: >I am curious about efficient ways of manipulating lists. If you're really concerned with efficiency, the typical answer is: Use C. If you want elegance, maintainability, and ease of coding, use Icon. Of course I know this isn't what you want to know about. You want to know if there are any obvious ways to code list manipulating routines in Icon to maximize efficiency within that framework. There's a neat way of checking the speed of Icon expressions, it turns out. In the Icon Program Library there's a little program called empg. It's a timing utility. If you want to time, say, l := [] every put(l, 1 to 4) you'd put these two lines into a file (prepend a colon to the first line, though; this is how you tell empg to execute the line, but not time it), and then run empg on that file. I just put the above two lines into a file called "xxx," and ran empg on it ("[iconx] empg xxx"). The result was a file called xxx.icn, which I then compiled and ran. The output was: --------------------------------------------------------------------------- iterations: 10000 &version: Icon Version 8.0. March 25, 1990 &host: SCO XENIX System V/386 &dateline: Tuesday, November 12, 1991 10:39 pm region sizes: static 20480 string 65024 block 65024 l := [] every put(l, 1 to 4) 0.47200 ms. garbage collections: total 5 static 0 string 0 block 5 region sizes: static 20480 string 65024 block 562176 ---------------------------------------------------------------------------- The block region looks pretty huge to me after the test - as if garbage collection wasn't getting done or something. Maybe I'll look at the code and see what's up. Anyway, you can see the .472 ms. above tells you how long it takes to do an "every put(l, 1 to 4)." You can use this program to test other list operations. I guess this doesn't really answer your question about what sorts of list ops in Icon are fastest, but I'm not really sure of all that you want to do. The empg program will give you answers far more specific than you'd get from me, though :-). Let us know what you find out. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From kwalker Mon Nov 18 11:50:46 1991 Received: from ocotillo.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 18 Nov 91 11:50:46 MST Date: Mon, 18 Nov 91 11:50:45 MST From: "Kenneth Walker" Message-Id: <9111181850.AA29453@ocotillo.cs.arizona.edu> Received: by ocotillo.cs.arizona.edu; Mon, 18 Nov 91 11:50:45 MST To: icon-group Subject: RE: list manipulation > Richard Goerwitz writes: > If you want to time, say, > l := [] > every put(l, 1 to 4) > you'd put these two lines into a file (prepend a colon to the first line, > though; this is how you tell empg to execute the line, but not time it), > and then run empg on that file. > ... > The block region looks pretty huge to me after the test - as if garbage > collection wasn't getting done or something. The problem here is that put(l, 1 to 4) is being executed repeatedly (otherwise the timing will be less than the resolution of the system clock). The same list is used for each execution and all the elements end up in it so nothing can be collected. Ken Walker / Computer Science Dept / Univ of Arizona / Tucson, AZ 85721 +1 602 621-4252 kwalker@cs.arizona.edu {uunet|allegra|noao}!arizona!kwalker From icon-group-request@arizona.edu Tue Nov 19 19:04:09 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Tue, 19 Nov 91 19:04:09 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Merlin.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA28150; Tue, 19 Nov 91 18:30:09 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Tue, 19 Nov 1991 18:29 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA04437; Tue, 19 Nov 91 15:24:55 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Tue, 19 Nov 1991 18:29 MST Date: 19 Nov 91 20:10:11 GMT From: micro-heart-of-gold.mit.edu!wupost!uwm.edu!linac!uchinews!ellis!goer@bloom-beacon.mit.edu (Richard L. Goerwitz) Subject: getch() for UNIX (was Re: global variables) Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <2671E71D0A606591@Arizona.edu> Message-Id: <1991Nov19.201011.15479@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago Computing Organizations References: <9111172152.AA25943@relay1.UU.NET>, <1991Nov17.235713.27108@midway.uchicago.edu> From icon-group-request@arizona.edu Tue Nov 19 19:04:22 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Tue, 19 Nov 91 19:04:22 MST Resent-From: icon-group-request@arizona.edu Received: from Arizona.edu (Osprey.Telcom.Arizona.EDU) by optima.cs.arizona.edu (4.1/15) id AA28112; Tue, 19 Nov 91 18:30:01 MST Received: from ucbvax.Berkeley.EDU by Arizona.edu with PMDF#10282; Tue, 19 Nov 1991 18:28 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA04437; Tue, 19 Nov 91 15:24:55 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@arizona.edu (icon-group@arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Resent-Date: Tue, 19 Nov 1991 18:29 MST Date: 19 Nov 91 20:10:11 GMT From: micro-heart-of-gold.mit.edu!wupost!uwm.edu!linac!uchinews!ellis!goer@bloom-beacon.mit.edu (Richard L. Goerwitz) Subject: getch() for UNIX (was Re: global variables) Sender: icon-group-request@arizona.edu Resent-To: icon-group@cs.arizona.edu To: icon-group@arizona.edu Resent-Message-Id: <26613FBB4C0031AA@Arizona.edu> Message-Id: <1991Nov19.201011.15479@midway.uchicago.edu> X-Envelope-To: icon-group@CS.Arizona.EDU X-Vms-To: icon-group@Arizona.edu Organization: University of Chicago Computing Organizations References: <9111172152.AA25943@relay1.UU.NET>, <1991Nov17.235713.27108@midway.uchicago.edu> After teasing Jerry Nowlin for an understandable desire to promote his fine product, I asked if he'd perhaps check to see if some SYSV code worked with ISIcon. What it was was a getch() implementation for UNIX. I've been fooling with some code to do this, and passing it around to some people. It seems to work okay, but it's SYSV only right now. I took a stab at how to do it under BSD, but didn't really get anywhere, not being a BSD person primarily. Anyway, if someone knows a thing or two about BSD ioctl and character special files, then perhaps he or she could clean up the following code so it works like the SYSV code its stuck together with. The end result would be a workable set of kbhit(), getch(), getche() routines for UNIX. To compile these into Icon, you'll need to (for __STDC__ systems) define the routines here in h/proto.h (or wherever the prototypes are), and also reconfigure your Icon source with config/unix/your-unix-variant/defines.h mentioning #define KeyboardFncs. Then tack these routines onto the end of src/iconx/fsys.c, making sure you've checked it over for your system. Note: Don't screw up your Icon source tree while testing. Use the Icon personalized interpreter mechanism. Note also: The routines below are defined as working only for XENIX_386. There are some #ifdef BSD routines later on that aren't tested. These are what really need work. The Xenix routines may work on all SYSVr3-4 systems, but you'll need to add -lx to the config/unix/your-unix-verions/iconx.hdr file where the libraries are mentioned if you're using Xenix (SYSV, too?). With a little team effort, maybe we can get this all worked out. #ifdef XENIX_386 /* maybe any SYSV; if so define it in the define.h file */ /* #include - included by ../h/config.h above */ #include #include #include #include #define ECHO_ON 1 #define ECHO_OFF 0 #define CopyTty(t1,t2) if (! reset_flag) {\ t2 = t1;\ reset_flag = 1;\ } #define ResetTty(t1) if (reset_flag) {\ if (ioctl(0, TCSETA, t1) == -1)\ RunErr(-214, NULL);\ } /* * kbhit(): word * * Routine to check for the availability of characters on the stdin * stream. Does not actually read any characters. Returns nonzero * value if characters are waiting; otherwise, zero. */ word kbhit() { struct termio tty, new_tty; register word status, isa_tty = 0, reset_flag = 0; extern word errno; if (isatty(0)) { isa_tty = 1; if (ioctl(0, TCGETA, &tty) == -1) RunErr(-214, NULL); if (tty.c_lflag & ICANON) { CopyTty(tty, new_tty); new_tty.c_lflag &= ~ICANON; if (ioctl(0, TCSETA, &new_tty) == -1) { ResetTty(&tty); RunErr(-214, NULL); } } } /* see if anything is waiting to be read from the file */ status = rdchk(0); if (status == -1) { if (isa_tty) ResetTty(&tty); switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } ResetTty(&tty); return status; } /* * getch(): word * * Routine to read one char from the standard input. Disables canonical * processing so as to read the character right from the buffer, without * having to wait for a carriage return. Re-enables canonical processing * after reading a character. Note that getch() does not echo any char- * acters to the screen, although characters might appear on the screen * anyway, if they were typed before getch() was invoked. */ word getch() { word read_a_char(); return read_a_char(ECHO_OFF); } /* * getche(): word * * Routine to read one char from the standard input. Disables canonical * processing so as to read the character right from the buffer, without * having to wait for a carriage return. Re-enables canonical processing * after reading a character. NOTE: Getche() does not disable echoing, * so anything typed after it is invoked will be echoed to the screen * (unlike getch(), which disables echoing; see above). */ word getche() { word read_a_char(); return read_a_char(ECHO_ON); } /* * read_a_char(turn_echo_on): word * * Routine to actually do the reading (either with echo or without, * depending on whether turn_echo_on is 1 or 0). */ word read_a_char(turn_echo_on) word turn_echo_on; { char c; struct termio tty, new_tty; register word status, isa_tty = 0, reset_flag = 0; novalue abort_on_signal(); extern word errno; if (isatty(0)) { isa_tty = 1; if (ioctl(0, TCGETA, &tty) == -1) RunErr(-214, NULL); /* disable keyboard signals quit & interrupt */ if (tty.c_lflag & ISIG) { CopyTty(tty, new_tty); /* a macro, defined above */ new_tty.c_lflag &= ~ISIG; } /* disable canonical input processing (like BSD cbreak) */ if (tty.c_lflag & ICANON) { CopyTty(tty, new_tty); new_tty.c_lflag &= ~ICANON; } if (tty.c_cc[VMIN] != '\1') { CopyTty(tty, new_tty); new_tty.c_cc[VMIN] = (unsigned char )'\1'; } if (tty.c_cc[VTIME]) { CopyTty(tty, new_tty); new_tty.c_cc[VTIME] = (unsigned char )'\0'; } if (turn_echo_on) { /* set echo bit, i.e. enable echo */ if (! (tty.c_lflag & ECHO)) { CopyTty(tty, new_tty); new_tty.c_lflag |= ECHO; } } else { /* i.e. if _not_ turn_echo_on */ /* mask out echo bit, i.e. disable echo */ if (tty.c_lflag & ECHO) { CopyTty(tty, new_tty); new_tty.c_lflag &= ~ECHO; } } ResetTty(&new_tty); /* a macro, defined above */ } /* finally, read 1 char from the standard input */ status = read(0, &c, 1); if (status == -1) { if (isa_tty) ResetTty(&tty); switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } /* Check for quit and interrupt characters. */ if (isa_tty) { if ((unsigned char)c == tty.c_cc[VINTR]) { ResetTty(&tty); if (kill(getpid(), SIGINT) == -1) { perror("kill"); RunErr(500, NULL); } } else if ((unsigned char)c == tty.c_cc[VQUIT]) { ResetTty(&tty); if (kill(getpid(), SIGQUIT) == -1) { perror("kill"); RunErr(500, NULL); } } else ResetTty(&tty); } else ResetTty(&tty); if (! status) return -1; else return (word )c; } #else /* not XENIX_386 */ #if defined(SUN) || defined(NEXT) /* This code is quite untested, and is considerably less sophisticated * than the SYSVr3+/XENIX code I wrote above. Somebody who knows the * ins and outs of BSD ioctls, please take a look at this! */ /* #include - included by ../h/config.h */ #ifdef SUN #include #endif #include #include #define CopyTty(t1,t2) if (! reset_flag) {\ t2 = t1;\ reset_flag = 1;\ } word kbhit() { word arg; struct sgttyb tty, new_tty; register status, reset_flag = 0; extern word errno; if (isatty(0)) { if (ioctl(0, TIOCGETP, &tty) == -1) RunErr(-214, NULL); if (! (tty.sg_flags & CBREAK)) { CopyTty(tty, new_tty); new_tty.sg_flags |= CBREAK; if (ioctl(0, TIOCSETN, &new_tty) == -1) RunErr(-214, NULL); } } status = ioctl(0, FIONREAD, &arg); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } if (reset_flag) { if (status = ioctl(0, TIOCSETN, &tty) == -1) RunErr(-214, NULL); } return(arg); } /* * getch(): word * * Routine to read one char from the standard input. Enables cbreak mode * so as to read a character right from the buffer, without having to wait * for a carriage return. Disables cbreak mode after reading a character. * Note that getch() does not echo any characters to the screen, although * characters might appear on the screen anyway, if they were typed before * getch() was invoked. */ word getch() { char c; struct sgttyb tty, new_tty; register word status, reset_flag = 0; extern word errno; if (isatty(0)) { if (ioctl(0, TIOCGETP, &tty) == -1) RunErr(-214, NULL); if (! (tty.sg_flags & CBREAK)) { CopyTty(tty, new_tty); /* sets reset_flag */ new_tty.sg_flags |= CBREAK; } if (tty.sg_flags & ECHO) { CopyTty(tty, new_tty); new_tty.sg_flags &= ~ECHO; } if (reset_flag) { if (ioctl(0, TIOCSETN, &new_tty) == -1) RunErr(-214, NULL); } } status = read(0, &c, 1); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } if (reset_flag) { if (ioctl(0, TIOCSETN, &tty) == -1) RunErr(-214, NULL); } if (! status) return -1; else return (word )c; } /* * getche(): word * * Routine to read one char from the standard input. Enables cbreak mode, * to read the character right from the buffer, without having to wait for * a carriage return. Disables cbreak mode after reading a character. NOTE: * Getche() does not disable echoing, so anything typed after it is invoked * will be echoed to the screen (unlike getch(), which disables echoing; see * above). */ word getche() { char c; struct sgttyb tty, new_tty; register word status, reset_flag = 0; extern word errno; if (isatty(0)) { if (ioctl(0, TIOCGETP, &tty) == -1) RunErr(-214, NULL); if (! (tty.sg_flags & CBREAK)) { CopyTty(tty, new_tty); /* sets reset_flag */ new_tty.sg_flags |= CBREAK; } if (! (tty.sg_flags & ECHO)) { CopyTty(tty, new_tty); new_tty.sg_flags |= ECHO; } if (reset_flag) { if (ioctl(0, TIOCSETN, &new_tty) == -1) RunErr(-214, NULL); } } status = read(0, &c, 1); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } if (reset_flag) { if (ioctl(0, TIOCSETN, &tty) == -1) RunErr(-214, NULL); } if (! status) return -1; else return (word )c; } #endif /* SUN or NEXT */ #endif /* XENIX_386 */ #endif /* KeyboardFncs */ -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request Sat Nov 23 21:37:11 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 23 Nov 91 21:37:11 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA09553; Sat, 23 Nov 91 21:37:09 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA20120; Sat, 23 Nov 91 20:24:58 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 21 Nov 91 14:44:23 GMT From: mailer.cc.fsu.edu!sun13!sun8.scri.fsu.edu@gatech.edu (John Nall) Organization: SCRI, Florida State University Subject: Combinatorial generator Message-Id: <5628@sun13.scri.fsu.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu Hello, world: Does anyone happen to have a combinatorial generator they would be willing to share with me?? I'm a novice at writing in Icon, and although I've been able to do most (but not all) of the exercises in the book, I can't seem to figure out how to solve this particular problem. What I am looking for would be something along the lines of: every group := combination("abcd",3) which would say "generate all combinations of the four objects "a", "b", "c" and "d", taken 3 at a time". ("abc", "abd", "acd" and "bcd") Since I'm interested in how to program the problem, rather than any particular application, if you could even share a variation of the above it would probably be helpful. Thanks much -- John W. Nall | Supercomputer Computations Research Institute nall@sun8.scri.fsu.edu | Florida State University, Tallahassee, FL 32306 (904)-644-6008 | "Down with liberals. Down with conservatives." From icon-group-request Sat Nov 23 23:23:11 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 23 Nov 91 23:23:11 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA11967; Sat, 23 Nov 91 23:23:09 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA23759; Sat, 23 Nov 91 22:15:59 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 21 Nov 91 20:10:20 GMT From: world!ksr!tim@uunet.uu.net (Tim Peters) Organization: Kendall Square Research Corp. Subject: Re: Combinatorial generator Message-Id: <7238@ksr.com> References: <5628@sun13.scri.fsu.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <5628@sun13.scri.fsu.edu> nall@sun8.scri.fsu.edu (John Nall) writes: > ... > What I am looking for would be something along the lines of: > > every group := combination("abcd",3) > >which would say "generate all combinations of the four objects "a", "b", >"c" and "d", taken 3 at a time". ("abc", "abd", "acd" and "bcd") > > Since I'm interested in how to program the problem, rather than any >particular application, if you could even share a variation of the above >it would probably be helpful. John, once you get the hang of "recursive generators" in Icon, you'll find that this kind of problem is surprisingly easy -- the coding is a snap, the hard part is just finding the proper recurrence to build on (which has nothing to do with Icon itself). E.g., procedure main() local string, k every string := left("abcde", 0 to 5) & k := 0 to 1+*string do { write("Combinations of '", string, "' taken ", k, " at a time:") every write( " '", comb(string, k), "'" ) } end procedure comb( s, k ) k < 0 & stop("error in comb argument, need k=", k, " to be >=0") k = 0 & return "" *s < k & fail suspend (s[1] || comb(s[2:0], k-1)) | comb(s[2:0], k) end is built directly from the reasoning underlying the recurrence C(n,k) = C(n-1,k-1) + C(n-1,k) for binomial coefficients (see any combinatorics text for a derivation). The program prints: Combinations of '' taken 0 at a time: '' Combinations of '' taken 1 at a time: Combinations of 'a' taken 0 at a time: '' Combinations of 'a' taken 1 at a time: 'a' Combinations of 'a' taken 2 at a time: Combinations of 'ab' taken 0 at a time: '' Combinations of 'ab' taken 1 at a time: 'a' 'b' Combinations of 'ab' taken 2 at a time: 'ab' Combinations of 'ab' taken 3 at a time: Combinations of 'abc' taken 0 at a time: '' Combinations of 'abc' taken 1 at a time: 'a' 'b' 'c' Combinations of 'abc' taken 2 at a time: 'ab' 'ac' 'bc' Combinations of 'abc' taken 3 at a time: 'abc' Combinations of 'abc' taken 4 at a time: Combinations of 'abcd' taken 0 at a time: '' Combinations of 'abcd' taken 1 at a time: 'a' 'b' 'c' 'd' Combinations of 'abcd' taken 2 at a time: 'ab' 'ac' 'ad' 'bc' 'bd' 'cd' Combinations of 'abcd' taken 3 at a time: 'abc' 'abd' 'acd' 'bcd' Combinations of 'abcd' taken 4 at a time: 'abcd' Combinations of 'abcd' taken 5 at a time: Combinations of 'abcde' taken 0 at a time: '' Combinations of 'abcde' taken 1 at a time: 'a' 'b' 'c' 'd' 'e' Combinations of 'abcde' taken 2 at a time: 'ab' 'ac' 'ad' 'ae' 'bc' 'bd' 'be' 'cd' 'ce' 'de' Combinations of 'abcde' taken 3 at a time: 'abc' 'abd' 'abe' 'acd' 'ace' 'ade' 'bcd' 'bce' 'bde' 'cde' Combinations of 'abcde' taken 4 at a time: 'abcd' 'abce' 'abde' 'acde' 'bcde' Combinations of 'abcde' taken 5 at a time: 'abcde' Combinations of 'abcde' taken 6 at a time: Suspect the first stumbling block for you will be believing that recursive generators really work ; the second will be figuring out why I wrote the routine to fail in some endcases but to return a null string in others. Those choices aren't really as arbitrary as they may appear ... indeed-the-hardest-part-is-getting-the-endcases-just-right-ly y'rs - tim Tim Peters Kendall Square Research Corp tim@ksr.com, ksr!tim@uunet.uu.net From isidev!nowlin@uunet.uu.net Sun Nov 24 13:32:09 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 24 Nov 91 13:32:09 MST Received: from relay2.UU.NET by optima.cs.arizona.edu (4.1/15) id AA05820; Sun, 24 Nov 91 13:32:05 MST Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay2.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA13447; Sun, 24 Nov 91 15:32:08 -0500 Date: Sun, 24 Nov 91 15:32:08 -0500 From: isidev!nowlin@uunet.uu.net Message-Id: <9111242032.AA13447@relay2.UU.NET> Received: from isidev.UUCP by uunet.uu.net with UUCP/RMAIL (queueing-rmail) id 153104.18589; Sun, 24 Nov 1991 15:31:04 EST To: uunet!cs.arizona.edu!icon-group@uunet.uu.net Subject: Re: combinations This reminded me of a program I wrote for my daughter when she was in 4th grade some years ago. She had a list of 10 letters and the challenge was to get as many words from the list as possible. Since parents were allowed to help, I wrote a program in Icon to generate all the possible combinations of the 10 letters, run it through a spelling checker to identify non-words, and then list the real words left over. Not being real foresighted, I didn't really ponder how many possible combinations there were until I looked the next day and I'd run the my hard disk out of space and still not generated all the possibilities. Here's a program that has both the unique combinations (sets) that John Nall requested and Tim Peters provided, and the whole shebang with the order of objects in a given combination being significant. You feed this the objects and size instead of it iterating on everything. Let me know if I goofed since I blew away my original program a long time ago. Sorry but I couldn't leave Tim's procedure alone. I just changed it a little. I prefer clarity to idiomatic code, sometimes :-) Here's the output from the program for groups of three objects of size three: $ comb abc 3 Unique combinations: abc All combinations: abc acb bac bca cab cba WARNING: Try this on large groups of objects at your own risk: procedure main(args) local objs, size objs := get(args) | stop("No 1st argument") size := get(args) | stop("No 2nd argument") write("Unique combinations:") every write(combu(objs,size)) write("All combinations:") every write(comba(objs,size)) end procedure combu(objs,size) if size < 1 then fail else if size = 1 then suspend !objs else suspend objs[1] || combu(objs[2:0],size - 1) | combu(objs[2:0],size) end procedure comba(objs,size) local n if size < 1 then fail else if size = 1 then suspend !objs else { every n := 1 to *objs do if n = 1 then suspend objs[1] || comba(objs[2:0],size-1) else if n = *objs then suspend objs[n] || comba(objs[1:n],size-1) else suspend objs[n] || comba(objs[1:n] || objs[n+1:0],size-1) } end --- --- | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!nowlin --- --- From icon-group-request Sun Nov 24 16:24:41 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 24 Nov 91 16:24:41 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA10962; Sun, 24 Nov 91 16:24:39 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA17180; Sun, 24 Nov 91 15:21:20 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 24 Nov 91 23:08:11 GMT From: ksr!tim@uunet.uu.net (Tim Peters) Organization: Kendall Square Research Corp. Subject: Re: combinations Message-Id: <7301@ksr.com> References: <9111242032.AA13447@relay2.UU.NET> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <9111242032.AA13447@relay2.UU.NET> nowlin@isidev.UUCP writes: >... a list of 10 letters and the challenge was to get as many words >from the list as possible. ... I wrote a program in Icon to generate >all the possible combinations of the 10 letters, run it through a >spelling checker to identify non-words, and then list the real words >left over. Jerry, I faced a similar problem when writing a program to generate anagrams. I had thought that anagrams would be rare (based on how hard it is to find them by hand), but to the contrary found that, for a decent-sized phrase, there are an enormous number of anagrams. E.g., I just let the program run on "Iconic Software" for 5 seconds & it came up with several thousand distinct anagrams, not counting permutations of the words or of letters within the words. E.g., "I can coo swifter", "actinic woofers", "action if escrow" (and "action" can be replaced by "cation", and/or "escrow" by "cowers"), "I, of conic waters", "ionic cot wafers", ... and those are some of the few anagrams that actually make some sense . That program turned into quite a mess of backtracking trickery, but if anyone faces a similar sub-task, a good start is to go thru the entire dictionary (e.g., UNIX(tm) systems usually have large word lists hiding in /usr/dict/) at the beginning, saving away all the *feasible* words in an Icon set. Then all the checking can be done internal to the program (& set membership tests are quite fast), and against a vastly reduced set of possibilities. >... >Sorry but I couldn't leave Tim's procedure alone. I just changed it a >little. I prefer clarity to idiomatic code, sometimes :-) Well, it's about time! No newsgroup has really arrived until it's hosted its first vicious style flame . Seriously, I find either style easy to read, & in programs this simple actually do find your style clearer. Attached is a slightly fiddled copy of your program, showing a different way to get the "comba" (all permutations of "objs" taken "size" at a time) function -- comba2 simply takes the result sequence of combu and generates all permutations of each result. I think this is conceptually cleaner because it breaks the task into distinct subtasks (generating combinations, and generating permutations). More to the point, I really like this "perms" function & haven't seen it posted before . Finally, should note that "comb" and "permute" functions are already just a "link" statement away, via the file "permute.icn" in the Icon Program Library. a-vastly-underappreciated-resource-ly y'rs - tim Tim Peters Kendall Square Research Corp tim@ksr.com, ksr!tim@uunet.uu.net procedure main(args) local objs, size objs := get(args) | stop("No 1st argument") size := integer(get(args)) | stop("Need integer 2nd argument") write("Unique combinations:") every write(combu(objs,size)) write("All combinations:") every write(comba(objs,size)) write("All combinations, a different way:") every write(comba2(objs,size)) end procedure combu(objs,size) if size < 1 then fail else if size = 1 then suspend !objs else suspend objs[1] || combu(objs[2:0],size-1) | combu(objs[2:0],size) end procedure comba(objs,size) local n if size < 1 then fail else if size = 1 then suspend !objs else { every n := 1 to *objs do if n = 1 then suspend objs[1] || comba(objs[2:0],size-1) else if n = *objs then suspend objs[n] || comba(objs[1:n],size-1) else suspend objs[n] || comba(objs[1:n] || objs[n+1:0],size-1) } end procedure comba2(objs,size) suspend perms(combu(objs,size)) end procedure perms(objs) # generate all permutations of objs if *objs <= 1 then return objs else suspend (objs[1] <-> !objs) || perms(objs[2:0]) end >>> END OF MSG From TENAGLIA@mis.mcw.edu Tue Nov 26 09:56:00 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Tue, 26 Nov 91 09:56:00 MST Received: from MIS4.MIS.MCW.EDU by optima.cs.arizona.edu (4.1/15) id AA19660; Tue, 26 Nov 91 09:55:51 MST Received: from mis.mcw.edu by mis.mcw.edu (PMDF #12252) id <01GDEHV25ZLS9AMF1Y@mis.mcw.edu>; Tue, 26 Nov 1991 10:58 CST Date: Tue, 26 Nov 1991 10:58 CST From: Chris Tenaglia - 257-8765 Subject: Holiday Tidbit To: icon-group@cs.arizona.edu Message-Id: <01GDEHV25ZLS9AMF1Y@mis.mcw.edu> X-Organization: Medical College of Wisconsin (Milwaukee, WI) X-Vms-To: IN%"icon-group@cs.arizona.edu" As yet another holiday approaches I couldn't resist but to submit another game implemented in icon. TicTacToe is old hat to most of us. I also know I'll be flamed for my VT100 IO routines, oh well. Ten years ago I wrote a basic interpreter for a very unusual workstation that happened to have IBM 360 architecture. To test it's solidity I wrote a brute force tictactoe that would never loose. Impressive at first, but boring after a short time. I decided to try it again, this time in Icon. While on the way to coding the bulletproof version I envisioned, an alpha version has developed that doesn't win all the time. What's even more curious, it sometimes cheats! I'm including the source below for your comments and enjoyment. I'm going to save it myself for reference, but I hope to develope the invincable and fairplay version perhaps for the next holiday. Enjoy! Chris Tenaglia (System Manager) | "The past explained, Medical College of Wisconsin | the future fortold, 8701 W. Watertown Plank Rd. | the present largely appologized for." Milwaukee, WI 53226 | Organon to The Doctor (414)257-8765 | tenaglia@mis.mcw.edu, mcwmis!tenaglia ############################################################### # # # file : ttt.icn # # updt : 25-nov-1991 # # auth : chris tenaglia # # desc : tictactoe icon implementation # # # ############################################################### global me,you,true,false,draw,pointer,wins,pass,taken,winner procedure main() init() #declare some global stuff play := true while play == true do { me := set() # computer is me you := set() # player is you victory := false # we haven't won yet pass := 0 # start flag winner := "" taken := table(false) # taken position table (rather than set?) display() insert(me,5) taken[5] := true display() insert(you,(tmp := get_your_move())) taken[integer(tmp)] := true display() repeat { insert(me,(tmp := choose([1,2,3,4,6,7,8,9]))) taken[integer(tmp)] := true display() if (victory := done_yet()) == (true|draw) then break insert(you,(tmp := get_your_move())) taken[integer(tmp)] := true if (victory := done_yet()) == (true|draw) then break } display() if winner == "you" then victory := false case victory of { true : write(at(1,22),chop(&host)," Wins, You Loose!") false: write(at(1,22),chop(),"Wow! You actually beat the Computer.") draw : write(at(1,22),chop(),"Game was a draw.") } play := map(input(at(1,23) || "Another Game (Y/N) :")[1]) } end # # procedure to display the current tictactoe grid and plays # procedure display() if (pass +:= 1) = 1 then { write(cls(),uhalf()," T I C - T A C - T O E") write(lhalf()," T I C - T A C - T O E") write(trim(center("Computer is 'O' and you are 'X'",80))) line := repl("q",60) ; line[21] := "n" ; line[41] := "n" every y := 5 to 20 do writes(at(30,y),graf("x")) every y := 5 to 20 do writes(at(50,y),graf("x")) writes(at(10,10),graf(line)) writes(at(10,15),graf(line)) every x := 1 to 9 do writes(pointer[x],x) } every writes(pointer[!me],high("O")) every writes(pointer[!you],blink("X")) end # # procedure to obtain a move choice from the player # procedure get_your_move() local yours,all_moves repeat { writes(at(5,22)) yours := input("Enter block # (1-9) :") if not(integer(yours)) then { writes(at(5,23),beep(),"Invalid Input! Choose 1-9.") next } if (1 > yours) | (yours > 9) then { writes(at(5,23),beep(),"Value out of range! Choose 1-9.") next } if taken[integer(yours)] == true then { writes(at(5,23),beep(),"That position is already taken! Try again.") next } break } return yours end # # procedure chooses for the host a strong move # procedure choose(lst) local val,test test := 0 repeat { val := ?lst if (test +:= 1) > 200 then write(val," Choose: infinite loop") if taken[integer(val)] == true then next else break } return val end # # procedure to test if computer has won, or the game is a draw # procedure done_yet() every outcome := !wins do { test := 0 every part := !outcome do if member(me,part) then test +:= 1 if test = 3 then { winner := &host return true } } every outcome := !wins do { test := 0 every part := !outcome do if member(you,part) then test +:= 1 if test = 3 then { winner := "you" return true } } if *me + *you > 7 then return draw return false end # # prompts for an input from the user # procedure input(prompt) writes(prompt) return read() end # # procedures to output ansi graphics and attributes # procedure at(x,y) return "\e[" || y || ";" || x || "f" end procedure graf(str) return "\e(0" || str || "\e(B" end procedure uhalf(str) /str := "" return "\e#3" || str end procedure lhalf(str) /str := "" return "\e#4" || str end procedure high(str) return "\e[1m" || str || "\e[0m" end procedure normal(str) return "\e[0m" || str end procedure under(str) return "\e[4m" || str || "\e[0m" end procedure blink(str) return "\e[5m" || str || "\e[0m" end procedure cls(str) /str := "" return "\e[2J\e[H" || str end procedure chop(str) /str := "" return "\e[J" || str end procedure beep() return "\7" end # # procedure to init useful global variables for later use # procedure init() true := "y" false := "n" draw := "?" &rand := map(&clock,":","0") #randomize wins := [set([1,5,9]),set([3,5,7]),set([1,2,3]),set([4,5,6]), set([7,8,9]),set([1,4,7]),set([2,5,8]),set([3,6,9])] pointer := [at(17,7), at(37,7), at(57,7), at(17,12),at(37,12),at(57,12), at(17,17),at(37,17),at(57,17)] end From TENAGLIA@mis.mcw.edu Wed Nov 27 07:03:42 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 27 Nov 91 07:03:42 MST Received: from MIS4.MIS.MCW.EDU by optima.cs.arizona.edu (4.1/15) id AA14551; Wed, 27 Nov 91 07:03:38 MST Received: from mis.mcw.edu by mis.mcw.edu (PMDF #12252) id <01GDFQ5RI3C09AMFPF@mis.mcw.edu>; Wed, 27 Nov 1991 08:06 CST Date: Wed, 27 Nov 1991 08:06 CST From: Chris Tenaglia - 257-8765 Subject: TikTakToe To: icon-group@cs.arizona.edu Message-Id: <01GDFQ5RI3C09AMFPF@mis.mcw.edu> X-Organization: Medical College of Wisconsin (Milwaukee, WI) X-Vms-To: IN%"icon-group@cs.arizona.edu" I've a little closer inspection of the tiktaktoe game I submitted. I did find some careless coding. I've spiffed it up, and improved some of the video, but it still cheats. It can't recognize when the player has a winning row. I won't repost yet. I'm looking a little closer still. Oh well, that's what 2 hours programming turns out. Chris Tenaglia (System Manager) | Medical College of Wisconsin 8701 W. Watertown Plank Rd. | Milwaukee, WI 53226 (414)257-8765 | tenaglia@mis.mcw.edu, mcwmis!tenaglia From icon-group-request Wed Nov 27 15:16:21 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 27 Nov 91 15:16:21 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA15400; Wed, 27 Nov 91 15:16:19 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA25764; Tue, 26 Nov 91 11:28:44 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 26 Nov 91 01:35:35 GMT From: ksr!tim@uunet.uu.net (Tim Peters) Organization: Kendall Square Research Corp. Subject: Re: case insensitive find() Message-Id: <7330@ksr.com> References: <1991Nov25.120052.2151@arizona.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <1991Nov25.120052.2151@arizona.edu> parten@ece.arizona.edu (Kurt Parten) writes: >Has anyone come up with a case insensitive find()? >I am not even sure how to implement such a creature. Try rolling your own on the fly, Kurt. E.g., where you have find( needle, haystack ) now, try find( map(needle), map(haystack) ) instead. Or where you have haystack ? { ... find(needle) ... } now, try map(haystack) ? { ... find(map(needle)) ... } instead. Many people I've shown the last version to are leery of it, because somewhere along the way they got the idea that the subject of string scanning must be a plain variable name or constant; but Icon is very consistent about allowing an arbitrary expression anywhere a value is allowed. or-if-you-know-how-to-write-"find"-using-more-primitive-operations- you-can-write-a-"case_insenstive_find"-in-much-the-same-way-ly y'rs - tim Tim Peters Kendall Square Research Corp tim@ksr.com, ksr!tim@uunet.uu.net From TENAGLIA@mis.mcw.edu Wed Nov 27 15:38:28 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 27 Nov 91 15:38:28 MST Received: from MIS4.MIS.MCW.EDU by optima.cs.arizona.edu (4.1/15) id AA16489; Wed, 27 Nov 91 15:38:20 MST Received: from mis.mcw.edu by mis.mcw.edu (PMDF #12252) id <01GDG84VP5M89AMFPF@mis.mcw.edu>; Wed, 27 Nov 1991 16:41 CST Date: Wed, 27 Nov 1991 16:41 CST From: Chris Tenaglia - 257-8765 Subject: Holiday Tidbit Version 2 To: icon-group@cs.arizona.edu Message-Id: <01GDG84VP5M89AMFPF@mis.mcw.edu> X-Organization: Medical College of Wisconsin (Milwaukee, WI) X-Vms-To: IN%"icon-group@cs.arizona.edu" Well it's the holiday again. I poked at the tictactoe game a little more and still couldn't find the reason why it cheats. I'm reposting the latest version for anyone interested in a holiday hack. I kind of like it this way. The more you try to win, the more you loose. The key is not trying to win (it doesn't, but it does)! Enjoy. Chris Tenaglia (System Manager) | "The past explained, Medical College of Wisconsin | the future fortold, 8701 W. Watertown Plank Rd. | the present largely appologized for." Milwaukee, WI 53226 | Organon to The Doctor (414)257-8765 | tenaglia@mis.mcw.edu, mcwmis!tenaglia -------------------------------------------------------------------------- ############################################################### # # # file : ttt.icn # # updt : 27-nov-1991 # # auth : chris tenaglia # # desc : tictactoe icon implementation # # # ############################################################### global me,you,true,false,draw,pointer,wins,pass,taken,winner,mark,row procedure main() init() play := true while play == true do { me := set() # computer is me you := set() # player is you victory := "" # nobodys' won yet pass := 0 # start flag winner := "" taken := table(false) # taken position table (rather than set?) display() insert(me,5) taken[5] := true display() insert(you,(tmp := get_your_move())) taken[integer(tmp)] := true display() repeat { insert(me,(tmp := choose([1,2,3,4,6,7,8,9]))) taken[integer(tmp)] := true display() if (victory := done_yet()) == (true|false|draw) then break insert(you,(tmp := get_your_move())) taken[integer(tmp)] := true if (victory := done_yet()) == (true|false|draw) then break } display() case winner of { "me" : { write(at(1,22),chop(&host)," Wins, You Loose!") every square := !row do writes(pointer[square],mark) } "you": { write(at(1,22),chop(),"Wow! You actually beat the Computer.") write(at(1,22),chop(&host)," Wins, You Loose!") every square := !row do writes(pointer[square],mark) } draw : write(at(1,22),chop(),"Game was a draw.") } play := map(input(at(1,23) || "Another Game (Y/N) :")[1]) } end # # # # procedure to display the current tictactoe grid and plays # procedure display() if (pass +:= 1) = 1 then { write(cls(),uhalf()," T I C - T A C - T O E") write(lhalf()," T I C - T A C - T O E") write(trim(center("Computer is 'O' and you are 'X'",80))) line := repl("q",60) ; line[21] := "n" ; line[41] := "n" every y := 5 to 20 do writes(at(30,y),graf("x")) every y := 5 to 20 do writes(at(50,y),graf("x")) writes(at(10,10),graf(line)) writes(at(10,15),graf(line)) every x := 1 to 9 do writes(pointer[x],dim(x)) } every writes(pointer[!me],high("O")) every writes(pointer[!you],under("X")) end # # procedure to obtain a move choice from the player # procedure get_your_move() local yours,all_moves repeat { writes(at(5,22)) yours := input("Enter block # (1-9) :") writes(at(5,23),chop()) if not(integer(yours)) then { writes(at(5,23),beep(),"Invalid Input! Choose 1-9.") next } if (1 > yours) | (yours > 9) then { writes(at(5,23),beep(),"Value out of range! Choose 1-9.") next } if taken[integer(yours)] == true then { writes(at(5,23),beep(),"That position is already taken! Try again.") next } break } return yours end # # # # procedure chooses for the host a strong move # procedure choose(lst) local val,test test := 0 repeat { val := ?lst if (test +:= 1) > 200 then write(val," Choose: infinite loop") if taken[integer(val)] == true then next else break } return val end # # procedure to test if computer has won, or the game is a draw # procedure done_yet() every outcome := !wins do { test := 0 every part := !outcome do if member(you,part) then test +:= 1 if test = 3 then { winner := "you" row := outcome mark := high(blink("X")) return true } } every outcome := !wins do { test := 0 every part := !outcome do if member(me,part) then test +:= 1 if test = 3 then { winner := "me" row := outcome mark := high(blink("O")) return true } } if *me + *you > 8 then { winner := draw return draw } return "not done yet" end # # prompts for an input from the user # procedure input(prompt) writes(prompt) return read() end # # # # procedures to output ansi graphics and attributes # procedure at(x,y) return "\e[" || y || ";" || x || "f" end procedure graf(str) return "\e(0" || str || "\e(B" end procedure uhalf(str) /str := "" return "\e#3" || str end procedure lhalf(str) /str := "" return "\e#4" || str end procedure high(str) return "\e[1m" || str || "\e[0m" end procedure normal(str) return "\e[0m" || str end procedure dim(str) return "\e[2m" || str || "\e[0m" end procedure under(str) return "\e[4m" || str || "\e[0m" end procedure blink(str) return "\e[5m" || str || "\e[0m" end procedure cls(str) /str := "" return "\e[2J\e[H" || str end procedure chop(str) /str := "" return "\e[J" || str end procedure beep() return "\7" end # # # # procedure to init useful global variables for later use # procedure init() true := "y" false := "n" draw := "?" &random := map(&clock,":","0") wins := [set([1,5,9]),set([3,5,7]),set([1,2,3]),set([4,5,6]), set([7,8,9]),set([1,4,7]),set([2,5,8]),set([3,6,9])] pointer := [at(17,7), at(37,7), at(57,7), at(17,12),at(37,12),at(57,12), at(17,17),at(37,17),at(57,17)] end From icon-group-request Wed Nov 27 18:25:34 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 27 Nov 91 18:25:34 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA01096; Wed, 27 Nov 91 18:25:30 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA25523; Tue, 26 Nov 91 11:18:10 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 26 Nov 91 00:46:11 GMT From: uchinews!ellis!goer@uunet.uu.net (Richard L. Goerwitz) Organization: University of Chicago Computing Organizations Subject: Re: case insensitive find() Message-Id: <1991Nov26.004611.8383@midway.uchicago.edu> References: <1991Nov25.120052.2151@arizona.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu parten@ece.arizona.edu (Kurt Parten) writes: >Has anyone come up with a case insensitive find()? >I am not even sure how to implement such a creature. Find all cases of string s in file f: f := open("some-file-or-other") lowercase_s := map(s) every line := !f do { if find(lowercase_s, map(line)) then write("found ",s," in ",line) } -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request Wed Nov 27 18:33:06 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 27 Nov 91 18:33:06 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA01277; Wed, 27 Nov 91 18:33:03 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA02727; Wed, 27 Nov 91 16:31:05 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 27 Nov 91 22:28:00 GMT From: UTARLG.UTA.EDU!B912DIEG@ucbvax.berkeley.edu Subject: Machine Translation Program available via anonymous FTP Message-Id: <01GDG7O2WRS00001JS@utarlg.uta.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu 11/27/91 Dear Fellow Icon Enthusiasts: I have just uploaded a new version of my TRAN1 machine translation program to the ICON Project. It may be retrieved via anonymous FTP from cs.arizona.edu:icon/contrib . There are three files: TRAN1SP.ZIP The TRAN1 machine translation program for Spanish. PCL.ZIP Documentation for TRAN1 in Helett-Packard Laser Jet format. PCW.ZIP Documentation for TRAN1 in PC-Write Format. The original TRAN1.ZIP is also still available. It is still of some value as a demo. For one thing, it will execute on an 8088 machine with limited memory. I'm not sure the new version will execute on a machine with limited memory (it was developed with ICON 386), and it certainly won't in the near future when I expand it again. The original TRAN1.ZIP is also a little simpler, so it would be good as a tutorial. I'm planning to add some more features to TRAN1SP over the next couple of months, so it will be getting more and more complicated. So, you may want to download TRAN1.ZIP as well if you haven't already. The documentation is an electronic copy of my thesis. Chapter four is the most interesting chapter for ICON programmers. Those who don't have anonymous FTP access can send me a post- paid mailer with an MS-DOS compatible disk or disks, and I will return the files by surface mail. A 1.2 Meg or 1.44 Meg disk would be best. (Three 360 K or two 720 K disks would also work.) Doug Witmer Internet: b912dieg@utarlg.uta.edu Bitnet: b912dieg@utarlg smail: 1102 Enterprise Drive #149, Grand Prairie, Texas 75051 From icon-group-request Wed Nov 27 18:43:46 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 27 Nov 91 18:43:46 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA01614; Wed, 27 Nov 91 18:43:43 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA29723; Mon, 25 Nov 91 17:15:05 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 25 Nov 91 19:00:51 GMT From: att!news.cs.indiana.edu!arizona.edu!arizona.edu!news@ucbvax.berkeley.edu (Kurt Parten) Organization: Dept of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona Subject: case insensitive find() Message-Id: <1991Nov25.120052.2151@arizona.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu Has anyone come up with a case insensitive find()? I am not even sure how to implement such a creature. Thanks for any help, Kurt -- Kurt Parten parten@helios.ece.arizona.edu From icon-group-request Wed Nov 27 23:11:53 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 27 Nov 91 23:11:53 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA12214; Wed, 27 Nov 91 23:11:49 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA18996; Wed, 27 Nov 91 22:02:47 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 28 Nov 91 05:53:00 GMT From: UTARLG.UTA.EDU!B912DIEG@ucbvax.berkeley.edu Subject: Machine Translation program available via anonymous FTP Message-Id: <01GDGN7ORTE80003IX@utarlg.uta.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu 11/27/91 Dear Fellow Icon Enthusiasts: I have just uploaded a new version of my TRAN1 machine translation program to the ICON Project. It may be retrieved via anonymous FTP from cs.arizona.edu:icon/contrib . There are three files: TRAN1SP.ZIP The TRAN1 machine translation program for Spanish. PCL.ZIP Documentation for TRAN1 in Helett-Packard Laser Jet format. PCW.ZIP Documentation for TRAN1 in PC-Write Format. The original TRAN1.ZIP is also still available. It is still of some value as a demo. For one thing, it will execute on an 8088 machine with limited memory. I'm not sure the new version will execute on a machine with limited memory (it was developed with ICON 386), and it certainly won't in the near future when I expand it again. The original TRAN1.ZIP is also a little simpler, so it would be good as a tutorial. I'm planning to add some more features to TRAN1SP over the next couple of months, so it will be getting more and more complicated. So, you may want to download TRAN1.ZIP as well if you haven't already. The documentation is an electronic copy of my thesis. Chapter four is the most interesting chapter for ICON programmers. Those who don't have anonymous FTP access can send me a post- paid mailer with an MS-DOS compatible disk or disks, and I will return the files by surface mail. A 1.2 Meg or 1.44 Meg disk would be best. (Three 360 K or two 720 K disks would also work.) Doug Witmer Internet: b912dieg@utarlg.uta.edu Bitnet: b912dieg@utarlg smail: 1102 Enterprise Drive #149, Grand Prairie, Texas 75051 From icon-group-request Fri Nov 29 06:00:02 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 29 Nov 91 06:00:02 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA03851; Fri, 29 Nov 91 06:00:00 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA20971; Fri, 29 Nov 91 04:46:59 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 27 Nov 91 23:49:28 GMT From: fernwood!cronos!laguna!alex@uunet.uu.net (Bob Alexander) Organization: Metaphor Computer Systems, Mountain View, CA Subject: Re: case insensitive find() Message-Id: <1648@cronos.metaphor.com> References: <1991Nov25.120052.2151@arizona.edu>, <7330@ksr.com> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <7330@ksr.com> tim@ksr.com (Tim Peters) writes: > map(haystack) ? { ... find(map(needle)) ... } Beware -- a potential gotcha with this technique is in using parts of the scanned string, e.g.: map(haystack) ? { ... (s := tab(find(map(needle)))) ... } sets s to the downshifted string, not the original portion of haystack. -- Bob Alexander Metaphor Computer Systems (415) 961-3600 x751 alex@metaphor.com ====^=== Mountain View, CA ...{uunet}!{decwrl,apple}!metaphor!alex From icon-group-request Mon Dec 2 12:12:18 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 2 Dec 91 12:12:18 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA10769; Mon, 2 Dec 91 12:12:15 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA19145; Mon, 2 Dec 91 10:47:20 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 2 Dec 91 17:14:51 GMT From: mcsun!uknet!warwick!kingpol!titan.kingston.ac.uk!as_m455@uunet.uu.net Organization: Kingston Polytechnic Subject: ICON, MUDs and Neural Nets ... Message-Id: <1991Dec2.171451.1@titan.kingston.ac.uk> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu Hi, I'm coding a MUD in ICON ... as an extension of a simple interpreter I wrote in the summer ... and I'd like some info. on how to add sockets to the system. My code will be running under DEC ULTRIX on a microVAX, and currently I'm planning to use pipes for interprocess comms while I'm waithing to get a socket library written. I'm also plaguing comp.ai.neural-nets as I'm planning to implement the monsters as neural nets, allowing them to respond in a more interesting manner. If anyone can give me help with either of these, or ideas for features to add to the MUD (which will also be a full-blown command shell) I'd be pleased to hear from you. - Hermes, the Megaflow Junkie. email - as_m455@titan.kingston.ac.uk From icon-group-request Mon Dec 2 19:25:20 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 2 Dec 91 19:25:20 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA29829; Mon, 2 Dec 91 19:25:18 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA05316; Mon, 2 Dec 91 18:05:46 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 2 Dec 91 23:20:49 GMT From: micro-heart-of-gold.mit.edu!wupost!uwm.edu!ux1.cso.uiuc.edu!uchinews!ellis!goer@bloom-beacon.mit.edu (Richard L. Goerwitz) Organization: University of Chicago Computing Organizations Subject: Re: ICON, MUDs and Neural Nets ... Message-Id: <1991Dec2.232049.20559@midway.uchicago.edu> References: <1991Dec2.171451.1@titan.kingston.ac.uk> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In <1991Dec2.171451.1@titan.kingston.ac.uk> as_m455@titan.kingston.ac.uk writes: >I'm coding a MUD in ICON ... as an extension of a simple interpreter I wrote in >the summer ... and I'd like some info. on how to add sockets to the system. My >code will be running under DEC ULTRIX on a microVAX, and currently I'm planning >to use pipes for interprocess comms while I'm waithing to get a socket library >written. Tell me if I'm wrong, but it looks like you want interprocess communication facilities to be accessible from Icon. The big question is what you expect to be reading from those sockets. Named pipes are certainly an easier way to do things, since you can set up everything using simple system() commands. >I'm also plaguing comp.ai.neural-nets as I'm planning to implement the monsters >as neural nets, allowing them to respond in a more interesting manner. >If anyone can give me help with either of these, or ideas for features to add >to the MUD (which will also be a full-blown command shell) I'd be pleased to >hear from you. Wish I could help more. Let us know what you come up with. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request Tue Dec 10 13:12:09 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Tue, 10 Dec 91 13:12:09 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA06978; Tue, 10 Dec 91 13:12:02 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA19526; Mon, 9 Dec 91 15:25:19 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 8 Dec 91 00:02:49 GMT From: csus.edu!beach.csulb.edu!nic.csu.net!usc!samsung!think.com!rpi!uwm.edu!linac!uchinews!ellis!goer@ucdavis.ucdavis.edu (Richard L. Goerwitz) Organization: University of Chicago Computing Organizations Subject: getch() again Message-Id: <1991Dec8.000249.3032@midway.uchicago.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu Followup to a previous post: Here's a version of getch() that works for a Sun4 and for Xenix/386. May work elsewhere. Tried to figure out how to get things to work under Mach on a NeXT, but didn't make any progress during the brief time I was fooling with it. -Richard #ifdef XENIX_386 /* #include - included by ../h/config.h above */ #include #include #include #include #define ECHO_ON 1 #define ECHO_OFF 0 #define CopyTty(t1,t2) if (! reset_flag) {\ t2 = t1;\ reset_flag = 1;\ } #define ResetTty(t1) if (reset_flag) {\ if (ioctl(0, TCSETA, t1) == -1)\ RunErr(-214, NULL);\ } /* * kbhit(): word * * Routine to check for the availability of characters on the stdin * stream. Does not actually read any characters. Returns nonzero * value if characters are waiting; otherwise, zero. The idea here, * as in getch() and getche(), is not to touch the tty settings * unless we have to. */ word kbhit() { struct termio tty, new_tty; register word status, isa_tty = 0, reset_flag = 0; extern word errno; if (isatty(0)) { isa_tty = 1; if (ioctl(0, TCGETA, &tty) == -1) RunErr(-214, NULL); if (tty.c_lflag & ICANON) { CopyTty(tty, new_tty); new_tty.c_lflag &= ~ICANON; } ResetTty(&new_tty); } /* see if anything is waiting to be read from the file */ status = rdchk(0); if (isa_tty) ResetTty(&tty); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } return status; } /* * getch(): word * * Routine to read one char from the standard input. Disables canonical * processing so as to read the character right from the buffer, without * having to wait for a carriage return. Re-enables canonical processing * after reading a character. Note that getch() does not echo any char- * acters to the screen, although characters might appear on the screen * anyway, if they were typed before getch() was invoked. */ word getch() { word read_a_char(); return read_a_char(ECHO_OFF); } /* * getche(): word * * Routine to read one char from the standard input. Disables canonical * processing so as to read the character right from the buffer, without * having to wait for a carriage return. Re-enables canonical processing * after reading a character. NOTE: Getche() does not disable echoing, * so anything typed after it is invoked will be echoed to the screen * (unlike getch(), which disables echoing; see above). */ word getche() { word read_a_char(); return read_a_char(ECHO_ON); } /* * read_a_char(turn_echo_on): word * * Routine to actually do the reading (either with echo or without, * depending on whether turn_echo_on is 1 or 0). */ word read_a_char(turn_echo_on) word turn_echo_on; { char c; struct termio tty, new_tty; register word status, isa_tty = 0, reset_flag = 0; novalue abort_on_signal(); extern word errno; if (isatty(0)) { isa_tty = 1; if (ioctl(0, TCGETA, &tty) == -1) RunErr(-214, NULL); /* disable keyboard signals quit & interrupt */ if (tty.c_lflag & ISIG) { CopyTty(tty, new_tty); /* a macro, defined above */ new_tty.c_lflag &= ~ISIG; } /* disable canonical input processing (like BSD cbreak) */ if (tty.c_lflag & ICANON) { CopyTty(tty, new_tty); new_tty.c_lflag &= ~ICANON; } if (tty.c_cc[VMIN] != '\1') { CopyTty(tty, new_tty); new_tty.c_cc[VMIN] = (unsigned char )'\1'; } if (tty.c_cc[VTIME]) { CopyTty(tty, new_tty); new_tty.c_cc[VTIME] = (unsigned char )'\0'; } if (turn_echo_on) { /* set echo bit, i.e. enable echo */ if (! (tty.c_lflag & ECHO)) { CopyTty(tty, new_tty); new_tty.c_lflag |= ECHO; } } else { /* i.e. if _not_ turn_echo_on */ /* mask out echo bit, i.e. disable echo */ if (tty.c_lflag & ECHO) { CopyTty(tty, new_tty); new_tty.c_lflag &= ~ECHO; } } ResetTty(&new_tty); /* a macro, defined above */ } /* finally, read 1 char from the standard input */ status = read(0, &c, 1); if (isa_tty) ResetTty(&tty); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } /* Check for quit and interrupt characters. */ if (isa_tty) { if ((unsigned char)c == tty.c_cc[VINTR]) { if (kill(getpid(), SIGINT) == -1) { perror("kill"); RunErr(500, NULL); } } else if ((unsigned char)c == tty.c_cc[VQUIT]) { if (kill(getpid(), SIGQUIT) == -1) { perror("kill"); RunErr(500, NULL); } } } if (! status) return -1; else return (word )c; } #else /* not XENIX_386 */ #ifdef SUN /* #include - included by ../h/config.h */ #include #include #include #define ECHO_ON 1 #define ECHO_OFF 0 #define CopyTty(t1,t2) if (! reset_flag) {\ t2 = t1;\ reset_flag = 1;\ } #define ResetTty(t1) if (reset_flag) {\ if (ioctl(0, TIOCSETN, t1) == -1)\ RunErr(-214, NULL);\ } /* * kbhit(): word * * Routine to check for the availability of characters on the stdin * stream. Does not actually read any characters. Returns nonzero * value if characters are waiting; otherwise, zero. The idea here, * as in getch() and getche(), is not to touch the tty settings * unless we have to. */ word kbhit() { word arg; struct sgttyb tty, new_tty; register word status, isa_tty = 0, reset_flag = 0; extern word errno; if (isatty(0)) { isa_tty = 1; if (ioctl(0, TIOCGETP, &tty) == -1) RunErr(-214, NULL); /* Our Sun4s here at the U of Chicago need this */ if (tty.sg_flags & ECHO) { CopyTty(tty, new_tty); new_tty.sg_flags &= ~ECHO; } /* enable cbreak mode */ if (! (tty.sg_flags & CBREAK)) { CopyTty(tty, new_tty); new_tty.sg_flags |= CBREAK; } ResetTty(&tty); } /* see if anything is waiting to be read from the file */ status = ioctl(0, FIONREAD, &arg); if (isa_tty) ResetTty(&tty); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } return arg; } /* * getch(): word * * Routine to read one char from the standard input. Enables cbreak mode * so as to read a character right from the buffer, without having to wait * for a carriage return. Disables cbreak mode after reading a character. * Note that getch() does not echo any characters to the screen, although * characters might appear on the screen anyway, if they were typed before * getch() was invoked. */ word getch() { word read_a_char(); return read_a_char(ECHO_OFF); } /* * getche(): word * * Routine to read one char from the standard input. Enables cbreak mode, * to read the character right from the buffer, without having to wait for * a carriage return. Disables cbreak mode after reading a character. NOTE: * Getche() does not disable echoing, so anything typed after it is invoked * will be echoed to the screen (unlike getch(), which disables echoing; see * above). */ word getche() { word read_a_char(); return read_a_char(ECHO_ON); } /* * read_a_char(turn_echo_on): word * * Routine to actually do the reading (either with echo or without, * depending on whether turn_echo_on is 1 or 0). */ word read_a_char(turn_echo_on) word turn_echo_on; { char c; struct tchars tty_characters; struct sgttyb tty, new_tty; register word status, isa_tty = 0, reset_flag = 0; extern word errno; if (isatty(0)) { isa_tty = 1; if (ioctl(0, TIOCGETP, &tty) == -1) RunErr(-214, NULL); if (ioctl(0, TIOCGETC, &tty_characters) == -1) RunErr(-214, NULL); if (turn_echo_on) { /* set echo bit, i.e. enable echo */ if (! (tty.sg_flags & ECHO)) { CopyTty(tty, new_tty); /* macro, defined above */ new_tty.sg_flags |= ECHO; } } else { /* mask out echo bit, i.e. disable echo */ if (tty.sg_flags & ECHO) { CopyTty(tty, new_tty); new_tty.sg_flags &= ~ECHO; } } /* raw mode; we'll process quit and interrupt by hand */ if (! (tty.sg_flags & RAW)) { CopyTty(tty, new_tty); new_tty.sg_flags |= RAW; } ResetTty(&new_tty); /* a macro, defined above */ } status = read(0, &c, 1); if (isa_tty) ResetTty(&tty); if (status == -1) { switch (errno) { case EBADF: RunErr(-212, NULL); default: RunErr(-214, NULL); } } /* Check for quit and interrupt characters. */ if (isa_tty) { if ((char )c == tty_characters.t_intrc) { if (kill(getpid(), SIGINT) == -1) { perror("kill"); RunErr(500, NULL); } } else if ((char )c == tty_characters.t_quitc) { if (kill(getpid(), SIGQUIT) == -1) { perror("kill"); RunErr(500, NULL); } } } if (! status) return -1; else return (word )c; } #endif /* SUN */ #endif /* XENIX_386 */ #endif /* KeyboardFncs */ -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request Tue Dec 10 15:00:58 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Tue, 10 Dec 91 15:00:58 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA12923; Tue, 10 Dec 91 15:00:53 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA24561; Tue, 10 Dec 91 03:04:42 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 7 Dec 91 17:02:11 GMT From: asuvax!cs.utexas.edu!uwm.edu!linac!uchinews!ellis!goer@g.ms.uky.edu (Richard L. Goerwitz) Organization: University of Chicago Computing Organizations Subject: expanding regions, Mach Message-Id: <1991Dec7.170211.22874@midway.uchicago.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu Has anyone looked into the problem of memory management using Icon on, say, a NeXT? I just want to ramble a few minutes here, with some ideas and questions. First, it looks as though Mach is like VMS in the sense that sbrk does not function as the lowest-level interface to the memory management routines. In both cases, successive calls are not guaranteed to re- turn pointers to memory chunks contiguous with the last chunk reques- ted via sbrk(). I just looked at the VMS routines for Icon, and I wonder how easy it would be to adapt them to Mach. Also, how viable would it be simply to have a routine that sat around and waited for sbrk(> 0) requests, and then (when it encountered them), just used vm_alloc (or whatever it is you use on a NeXT), obtained a block of contiguous memory, then did a block copy of the static, block, and string regions, realigned all the base pointers and what not, and then freed the old contiguous block (or, if there's a vm_realloc func- tion, used that)? Just curious. It's pretty annoying to have to manually set region sizes. Naive users can't use programs that require them to be so set, unless they are wrap- ped in a shell script that knows how much memory the program is likely to use. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request Tue Dec 10 23:51:48 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Tue, 10 Dec 91 23:51:48 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA06756; Tue, 10 Dec 91 23:51:45 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA14954; Tue, 10 Dec 91 22:45:43 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 11 Dec 91 06:10:08 GMT From: world!ksr!tim@decwrl.dec.com (Tim Peters) Organization: Kendall Square Research Corp. Subject: Re: String Scanning Question (from novice) Message-Id: <7762@ksr.com> References: <6030@sun13.scri.fsu.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <6030@sun13.scri.fsu.edu> nall@sun8.scri.fsu.edu (John Nall) writes: %I had a bug in a program, and finally found it. % ... [this one is yielding unexpected behavior] ... %procedure main() % line := "Now and then" % line ?:= move(3) & move(4) % write(line) # writes only "Now" %end % %BUT...(the fix) if I do it like this, it works ok: % %procedure main() % line := "Now and then" % line ?:= { move(3) & move(4) } % write(line) # writes " and" as it is supposed to %end %... Not to worry, John -- all will be clear! You need to look at the end of Appendix A to get the precedence of the operators straight. Note that in Icon (like as in C or Perl or ... but unlike as in Pascal or Fortran or ...), assignment is a binary operator, much like "+" and "*" and "&". Because it is just another binary operator, the precedence of assignment with respect to the other binary operators is important, and can occasionally lead to surprise. That's what's happening to you above. The table at the end of Appendix A says that all forms of assignment in Icon (whether plain, or the "swap" flavor, or augmented) have the same precedence, and that it's higher (binds more tightly) than the precedence of the "&" operator. In fact, "&" has *the* lowest precedence, which you will appreciate some day (trust me ). So in line ?:= move(3) & move(4) the "?:=" binds more tightly than the "&", so Icon groups it this way: (line ?:= move(3)) & move(4) The "move(4)" is off in outer space somewhere, & has nothing to do with scanning "line"; the attempt to move(4) will actually fail in your program, but failure doesn't generally cause an error msg so you probably didn't realize it. Try this line to see it a bit more clearly: line ?:= move(3) & (move(4) | write("I can't!")) The good news is that Icon is working as documented & that it isn't hard to learn "the rules". The bad news is that experience with other languages will work against you at first (some of your expectations are, well, wrong ). hang-in-there-it's-worth-a-little-initial-discomfort-ly y'rs - tim Tim Peters Kendall Square Research Corp tim@ksr.com, ksr!tim@uunet.uu.net ps: It's probably a bit more idiomatic to write your line ?:= { move(3) & move(4) } as line ?:= ( move(3) & move(4) ) although line ?:= { move(3); move(4) } is a suggestive alternative to think about. From isidev!nowlin@uunet.uu.net Wed Dec 11 06:53:18 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 11 Dec 91 06:53:18 MST Received: from relay1.UU.NET by optima.cs.arizona.edu (4.1/15) id AA24785; Wed, 11 Dec 91 06:53:15 MST Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay1.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA20684; Wed, 11 Dec 91 08:53:06 -0500 Date: Wed, 11 Dec 91 08:53:06 -0500 From: isidev!nowlin@uunet.uu.net Message-Id: <9112111353.AA20684@relay1.UU.NET> Received: from isidev.UUCP by uunet.uu.net with UUCP/RMAIL (queueing-rmail) id 085253.13836; Wed, 11 Dec 1991 08:52:53 EST To: uunet!cs.arizona.edu!icon-group@uunet.uu.net Subject: Re: novice scanning question > From: uunet!arizona!decwrl.dec.com!world!ksr!tim (Tim Peters) > In article ... nall@sun8.scri.fsu.edu (John Nall) writes: > %I had a bug in a program, and finally found it. > % ... [this one is yielding unexpected behavior] ... > %procedure main() > % line := "Now and then" > % line ?:= move(3) & move(4) > % write(line) # writes only "Now" > %end > % > %BUT...(the fix) if I do it like this, it works ok: > % > %procedure main() > % line := "Now and then" > % line ?:= { move(3) & move(4) } > % write(line) # writes " and" as it is supposed to > %end > %... > > { good explanation of operator precedence } > > ps: It's probably a bit more idiomatic to write your > line ?:= { move(3) & move(4) } > as > line ?:= ( move(3) & move(4) ) > although > line ?:= { move(3); move(4) } > is a suggestive alternative to think about. > It's most idiomatic (I think) to write: procedure main() line := "Now and then" line ?:= ( move(3) , move(4) ) write(line) end This is called mutual evaluation and while it's usually used for long sequences of compound conjunction operations it works just fine with only two expressions. --- --- | S | Iconic Software, Inc. - Jerry Nowlin - uunet!isidev!nowlin --- --- From icon-group-request Wed Dec 11 14:31:55 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 11 Dec 91 14:31:55 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA17133; Wed, 11 Dec 91 14:31:52 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA19470; Wed, 11 Dec 91 13:26:12 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 10 Dec 91 21:31:19 GMT From: mailer.cc.fsu.edu!sun13!sun8.scri.fsu.edu@gatech.edu (John Nall) Organization: SCRI, Florida State University Subject: String Scanning Question (from novice) Message-Id: <6030@sun13.scri.fsu.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu I had a bug in a program, and finally found it. But although I can fix it easily, I still don't understand what I did wrong. The book, page 32, says that the form of a string-scanning expression is: expr1 ? expr2 It also says (page 38) that an augmented assignment of the form: s ?:= expr "can be used to scan s and assign a new value to it as a result. The value assigned is the value produced by expr". The book also says (page 18) that when the conjunction expr1 & expr2 is evaluated, it produces the value of expr2 (assuming both succeed). The following little nonsense program illustrates the problem. As I understand, the expression "line ?:= move(3) & move(4)" should result in line being assigned the value resulting from the "move(4)" part. But it does not. Instead it assigns the value resulting from the "move(3)" part instead. (My bug, by the way :-) ) procedure main() line := "Now and then" line ?:= move(3) & move(4) write(line) # writes only "Now" end BUT...(the fix) if I do it like this, it works ok: procedure main() line := "Now and then" line ?:= { move(3) & move(4) } write(line) # writes " and" as it is supposed to end So is the book wrong? Or am I just misunderstanding what it says? Thanks -- John W. Nall | Supercomputer Computations Research Institute nall@sun8.scri.fsu.edu | Florida State University, Tallahassee, FL 32306 (904)-644-6008 | "Down with liberals. Down with conservatives." From icon-group-request Thu Dec 12 00:18:17 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Thu, 12 Dec 91 00:18:17 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA16919; Thu, 12 Dec 91 00:18:14 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA10169; Wed, 11 Dec 91 23:05:25 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 12 Dec 91 01:56:19 GMT From: ksr!tim@uunet.uu.net (Tim Peters) Organization: Kendall Square Research Corp. Subject: Re: novice scanning question Message-Id: <7803@ksr.com> References: <9112111353.AA20684@relay1.UU.NET> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <9112111353.AA20684@relay1.UU.NET> nowlin@isidev.UUCP writes: > > > From: uunet!arizona!decwrl.dec.com!world!ksr!tim (Tim Peters) > > ... > > line ?:= { move(3) & move(4) } > > vs > > line ?:= ( move(3) & move(4) ) > > vs > > line ?:= { move(3); move(4) } > > ... >It's most idiomatic (I think) to write: > > procedure main() > line := "Now and then" > line ?:= ( move(3) , move(4) ) > write(line) > end > >This is called mutual evaluation and while it's usually used for long >sequences of compound conjunction operations it works just fine with >only two expressions. It was an artificial example so it's hard to tell what "the natural" approach would be, but I got the impression that mutual evaluation (whether spelled as "e1 & e2" or "(e1, e2)") was not a natural approach to the poster's real task. I.e., do you really want to suck a self- proclaimed "novice" into the mysteries of backtracking ? Anyway, I steered him toward the quite different (semantically as well as syntactically) "{ e1; e2; ... }" form believing that it's less confusing for an Icon newbie -- works pretty much the way sequential blocks in other languages work. Whatever, I'm glad you drew explicit attention to mutual evaluation, Jerry; could well be what he's really looking for. consulting-via-telepathy-has-its-limitations-ly y'rs - tim Tim Peters Kendall Square Research Corp tim@ksr.com, ksr!tim@uunet.uu.net From icon-group-request Thu Dec 12 04:18:33 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Thu, 12 Dec 91 04:18:33 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA27734; Thu, 12 Dec 91 04:18:30 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA20278; Thu, 12 Dec 91 03:14:24 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 10 Dec 91 23:44:43 GMT From: fernwood!cronos!laguna!alex@uunet.uu.net (Bob Alexander) Organization: Metaphor Computer Systems, Mountain View, CA Subject: Re: expanding regions, Mach Message-Id: <1696@cronos.metaphor.com> References: <1991Dec7.170211.22874@midway.uchicago.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <1991Dec7.170211.22874@midway.uchicago.edu> goer@midway.uchicago.edu writes: >Also, how viable would it be simply to have a routine that sat around >and waited for sbrk(> 0) requests, and then (when it encountered them), >just used vm_alloc (or whatever it is you use on a NeXT), obtained a >block of contiguous memory, then did a block copy of the static, block, >and string regions, realigned all the base pointers and what not, and >then freed the old contiguous block (or, if there's a vm_realloc func- >tion, used that)? If I follow your drift, the technique you suggest is pretty much what is done in the Macintosh-MPW version. I didn't put in any code to relocate anything, though, so the region has to be expanded in place using a realloc()-type call in the Mac O/S that promises not to move anything. The problem is, of course, that by the time region expansion is needed, it's likely that other non-relocatable blocks have been allocated by some system service in such a way that the Icon-block can't be expanded in place. I haven't done any tests to see just how often region expansion succeeds or fails, but in general I don't think you can really count on it, and the best bet is to allocate bigger regions beforehand. Obviously, this technique works MUCH better if the Icon memory can be relocated, as Richard suggested. I haven't looked into how hard this would be -- perhaps the garbage collector could do it quite easily. Of course, you still have the problem of needing + to perform the copy. -- Bob Alexander Metaphor Computer Systems (415) 961-3600 x751 alex@metaphor.com ====^=== Mountain View, CA ...{uunet}!{decwrl,apple}!metaphor!alex From icon-group-request Thu Dec 12 06:48:24 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Thu, 12 Dec 91 06:48:24 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA02352; Thu, 12 Dec 91 06:48:20 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA02090; Thu, 12 Dec 91 05:45:22 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 11 Dec 91 04:09:26 GMT From: csus.edu!wupost!zaphod.mps.ohio-state.edu!uwm.edu!linac!uchinews!ellis!goer@ucdavis.ucdavis.edu (Richard L. Goerwitz) Organization: University of Chicago Computing Organizations Subject: Re: String Scanning Question (from novice) Message-Id: <1991Dec11.040926.11559@midway.uchicago.edu> References: <6030@sun13.scri.fsu.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <6030@sun13.scri.fsu.edu> nall@sun8.scri.fsu.edu (John Nall) writes: > >is evaluated, it produces the value of expr2 (assuming both succeed). > >The following little nonsense program illustrates the problem. >As I understand, the expression "line ?:= move(3) & move(4)" should >result in line being assigned the value resulting from the >"move(4)" part. But it does not. Instead it assigns the value >resulting from the "move(3)" part instead. (My bug, by the way :-) ) > >procedure main() > line := "Now and then" > line ?:= move(3) & move(4) > write(line) # writes only "Now" >end Think of line ?:= move(3) & move(4) as line := line ? move(3) & move(4). The precedences work out like this: (line := (line ? move(3))) & move(4). The result is that line is evaluated, producing a variable, then the scan- ning expression (line ? move(3)) gets evaluated, producing the value of move(3) (if it succeeds), and then assigning that value to the variable produced earlier on. If this whole business succeeds, then move(4) is evaluated. By now we're outside of the scanning expression, and &pos and &subject havewhatever values they had before evaluation of this line be- gan, so the results are irrelevant (and aren't used anyway). I guess the bottom line is that expression1 op:= expression2 always works out as expression1 := expression1 op expression2. >BUT...(the fix) if I do it like this, it works ok: > >procedure main() > line := "Now and then" > line ?:= { move(3) & move(4) } > write(line) # writes " and" as it is supposed to >end Good fix. It's always good to group expressions manually if there could be any confusion about their order of evaluation. Basically what you're doing is forcing move(3) & move(4) into a single expression that gets evaluated within the scanning operation set up by the ?:= operator. You could also just write line ?:= (move(3), move(4)), which to me looks more idiomatic, but does the same thing. If you prefer the other way, that's fine, too. >So is the book wrong? Or am I just misunderstanding what it says? Nobody's wrong. Everybody's right. Merry Christmas! :-) -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From cats!mark Thu Dec 12 06:48:48 1991 Received: from univers.cs.arizona.edu by cheltenham.cs.arizona.edu; Thu, 12 Dec 91 06:48:48 MST Message-Id: <9112121348.AA11072@univers.cs.arizona.edu> Received: from cats.UUCP by univers.cs.arizona.edu; Thu, 12 Dec 91 06:48:44 MST Date: 12/12/81 06:43:23 To: arizona!icon-group From: cats!mark Subject: Re: expanding regions, Mach SPITBOL has the same sbrk() problems as expandable-region Icon, and uses it in the same way. Sadly, sbrk() is getting harder to find in new systems, and I find myself sometimes having to stand on my head to give SPITBOL an expanding workspace. You don't need any special routine to "watch for sbrk(>0) calls". You just write your own sbrk function and link it when you build Icon, thus replacing the library sbrk (assuming you _can_ figure out how to write a well-behaved sbrk). Plain old brk() can be trivially written in terms of sbrk if you need it. To give the PharLap DOS-Extended version of Icon-386 expandable regions, I wrote an assembly-language sbrk built on top of PharLap's page-wise realloc system call. The only trick here is to recognize that your new, spiffy sbrk may be called before Icon's C-language main()! That is, some C startup code likes to copy environment (shell) variables and command line arguments into the program's local data space. Some C systems copy them to the stack, others use malloc() to get some memory to copy to. malloc() in turn can call sbrk. As long as your sbrk is written properly, there's no problem. However, it can make for some interesting debugging if the debugger insists on running the program up to main() before giving control to you. I don't have access to a NeXT, so I can't say what's involved there. I'm just pointing out some of what I've found to be useful on other systems. The last sbrk I built was for OS/2 2.0. Programs here can have a 512 Mb virtual workspace. I allocated a 256 Mb region using the option to "not commit" pages within that space. This call costs nothing -- it doesn't allocate RAM or disk pages -- it just reserves address space. My sbrk could then allocate from within this region, selectively committing pages as needed. Works like a champ. Maybe something like that can be done for NeXT? I'm happy to send my various sbrks to anyone who's interested in pursuing this. Mark Emmer Catspaw, Inc. P.O. Box 1123 Salida, Colorado 81201 USA Phone: 719-539-3884, 8 a.m. - 5 p.m., GMT-7. Fax: 719-539-4830 Internet: cats!mark@cs.arizona.edu uucpnet: ...{uunet,allegra,noao}!arizona!cats!mark From kwalker Thu Dec 12 11:07:59 1991 Received: from ocotillo.cs.arizona.edu by cheltenham.cs.arizona.edu; Thu, 12 Dec 91 11:07:59 MST Date: Thu, 12 Dec 91 11:07:59 MST From: "Kenneth Walker" Message-Id: <9112121807.AA21184@ocotillo.cs.arizona.edu> Received: by ocotillo.cs.arizona.edu; Thu, 12 Dec 91 11:07:59 MST To: icon-group Subject: Re: expanding regions, Mach > Date: 10 Dec 91 23:44:43 GMT > From: fernwood!cronos!laguna!alex@uunet.uu.net (Bob Alexander) [discussion about expanding regions with realloc() or malloc(); realloc() can fail if the space cannot be extended in-place] > Obviously, this technique works MUCH better if the Icon memory can be > relocated, as Richard suggested. I haven't looked into how hard this > would be -- perhaps the garbage collector could do it quite easily. Of > course, you still have the problem of needing + memory size> to perform the copy. The string and block regions can be relocated upward during a garbage collection to accommodate expansion of a region below them. If you know soon enough, it shouldn't be hard to relocate them to another chunk of memory. Even if you find out late in the game that this is needed to get enough free space (which I'm pretty sure is the case), you can always go through the motions of garbage collection a second time and do the relocation then. Two garbage collectons seem a little expensive, but is clearly better than dying with an "insufficient memory" message and is only done when you actually have to expand. You cannot relocate the static region, but this is not a problem. The static region is used to provided a controlled version of malloc(). In the scheme we are talking about, you are using the system supplied malloc() and there is no Icon-controled static region. Ken Walker / Computer Science Dept / Univ of Arizona / Tucson, AZ 85721 +1 602 621-4252 kwalker@cs.arizona.edu {uunet|allegra|noao}!arizona!kwalker From icon-group-request Thu Dec 12 20:49:45 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Thu, 12 Dec 91 20:49:45 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA12047; Thu, 12 Dec 91 20:49:40 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA20867; Thu, 12 Dec 91 19:42:56 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 12 Dec 91 22:51:53 GMT From: sdd.hp.com!cs.utexas.edu!convex!russur@hplabs.hpl.hp.com (Russ Urquhart) Organization: CONVEX Computer Corporation, Richardson, Tx., USA Subject: Snobol4 to Icon: A converter? Message-Id: Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu I'm kind of new to this group, so if this question has already been asked/answered, please bear with me. I an looking for a snobol4 to icon conversion. I have some snobol4 programs that I would like to convert and use under icon. Any help would be appreciated! Thanks Russ Urquhart Convex Computer Corporation From bralich@uhccux.uhcc.hawaii.edu Fri Dec 13 00:15:12 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 13 Dec 91 00:15:12 MST Received: from uhccux.uhcc.Hawaii.Edu ([128.171.7.2]) by optima.cs.arizona.edu (4.1/15) id AA17874; Fri, 13 Dec 91 00:15:00 MST Received: by uhccux.uhcc.Hawaii.Edu (5.61/Ultrix3.1) id AA20758; Thu, 12 Dec 91 21:14:59 -1000 Date: Thu, 12 Dec 91 21:14:59 -1000 From: Phil Bralich Message-Id: <9112130714.AA20758@uhccux.uhcc.Hawaii.Edu> To: icon-group@cs.arizona.edu Subject: From Snobol4 to Icon I am accostomed to programming in Snobol4, but people have repeatedly advised me to move to Icon. However, I am rather comfortable with the Snobol4 devices such as "break" and "arb" and "bal." I realize that I can write functions in Icon to do jobs I get from the Snobol4 langauge, but I am not an accomplished programmer. Would there be anyone who has done this before who could advise me on this? Phil Bralich bralich@uhccux.uhcc.Hawaii.edu From @festival.edinburgh.ac.uk,@emas-a.edinburgh.ac.uk:R.J.Hare@edinburgh.ac.uk Fri Dec 13 05:13:54 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 13 Dec 91 05:13:54 MST Received: from sun2.nsfnet-relay.ac.uk by optima.cs.arizona.edu (4.1/15) id AA00820; Fri, 13 Dec 91 05:13:45 MST Received: from festival.edinburgh.ac.uk by sun2.nsfnet-relay.ac.uk via JANET with NIFTP id <5588-13@sun2.nsfnet-relay.ac.uk>; Thu, 12 Dec 1991 10:15:05 +0000 Received: from emas-a.ed.ac.uk by castle.ed.ac.uk id aa08854; 9 Dec 91 9:55 WET Date: 09 Dec 91 09:55:44 gmt From: R.J.Hare@edinburgh.ac.uk Subject: Test To: icon-group@cs.arizona.edu Message-Id: <09 Dec 91 09:55:44 gmt 320815@EMAS-A> Sender: "R.J.Hare" <@emas-a.edinburgh.ac.uk:R.J.Hare@edinburgh.ac.uk> This is a test message, please ignore. Roger Hare. From icon-group-request Fri Dec 13 23:43:04 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Fri, 13 Dec 91 23:43:04 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA22479; Fri, 13 Dec 91 23:43:02 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA05070; Fri, 13 Dec 91 22:28:57 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 13 Dec 91 17:56:39 GMT From: sdd.hp.com!wupost!dsuvax.dsu.edu!ghelmer@hplabs.hpl.hp.com (Guy Helmer) Organization: Dakota State University Subject: CFP: SIXTH INTERNATIONAL CONFERENCE on SYMBOLIC and LOGICAL COMPUTING Message-Id: <1991Dec13.175639.23063@dsuvax.dsu.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu SIXTH INTERNATIONAL CONFERENCE on SYMBOLIC and LOGICAL COMPUTING DAKOTA STATE UNIVERSITY MADISON, SOUTH DAKOTA OCTOBER 15 - 16, 1992 ICEBOL6, the Sixth International Conference on Symbolic and Logical Computing, is designed for teachers, scholars, and programmers who want to meet to exchange ideas about computer programming for non-numeric applications -- especially those in the humanities. In addition to a focus on SNOBOL4, SPITBOL, and Icon, ICEBOL6 invites presentations on textual and logical processing in a variety of programming languages such as Prolog and C. Topics of discussion will include artificial intelligence and expert systems, and a wide range of analyses of texts in English and other natural languages. Parallel tracks of concurrent sessions are planned. ICEBOL's coffee breaks, social hours, lunches, and banquet will provide a series of opportunities for participants to meet and informally exchange information. CALL FOR PAPERS Abstracts (250-750 words) of proposed papers to be read at ICEBOL6 are invited in any area of non-numeric programming. Planned sessions include the following: analysis of texts (including bibliography, concordance, and index generation) artificial intelligence and expert systems computational linguistics computer languages and compilers designed for non-numeric processing electronic texts and encoding grammar and style checkers linguistic and lexical analysis (including parsing and machine translation) music analysis preparation of texts for publishing Papers must be in English and may not exceed twenty minutes reading time. Abstracts should be received by March 1, 1992. Notification of acceptance (based on recommendations of readers) will follow promptly. Papers will be published in ICEBOL6 Proceedings. Presentations at previous ICEBOL conferences were made by Paul Abrahams (ACM President), Gene Amdahl (Andor Systems), Robert Dewar (New York University), Mark Emmer (Catspaw, Inc.), James Gimpel (Lehigh), Ralph Griswold (Arizona), Susan Hockey (Oxford), Nancy Ide (Vassar) and many others. Copies of the ICEBOL5 Proceedings are available. FOR FURTHER INFORMATION All correspondence including abstracts of proposed papers as well as requests for registration materials may be sent to: Eric Johnson ICEBOL Director 114 Beadle Hall Dakota State University Madison, SD 57042 U.S.A. Inquiries, abstracts, and correspondence are encouraged via electronic mail, and they may be sent to Eric Johnson at: ERIC@SDNET.BITNET or johnsone@dsuvax.dsu.edu -- Guy Helmer, Dakota State University Computing Services ghelmer@dsuvax.dsu.edu, helmer@sdnet.bitnet, dsuvax!ghelmer@wunoc.wustl.edu Whip me, beat me, but don't make me maintain BASIC code! From icon-group-request Sat Dec 14 05:59:02 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 14 Dec 91 05:59:02 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA09303; Sat, 14 Dec 91 05:59:00 MST Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA20165; Sat, 14 Dec 91 04:56:04 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 12 Dec 91 18:53:06 GMT From: arizona.edu!arizona.edu!news@arizona.edu (Kurt Parten) Organization: Dept of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona Subject: read() problems Message-Id: <1991Dec12.115308.2254@arizona.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu Is there a way for read to be OS independent, so that DOS text files look like UNIX text files. I'm running icon under UNIX, but the data files are in DOS format. I would like to get rid of the ^M character that shows up in every line. I tried trim(read(dosfile), "\r"), but that transformed h e l l o \r \n to: h e l l o \r \r \n instead of h e l l o \n -- Kurt Parten parten@helios.ece.arizona.edu From ralph Sat Dec 14 06:43:19 1991 Date: Sat, 14 Dec 91 06:43:19 MST From: "Ralph Griswold" Message-Id: <9112141343.AA12627@cheltenham.cs.arizona.edu> Received: by cheltenham.cs.arizona.edu; Sat, 14 Dec 91 06:43:19 MST To: arizona.edu!arizona.edu!news@arizona.edu, icon-group@cs.arizona.edu Subject: Re: read() problems Icon automatically translates to and from "UNIX" format on input and output on non-UNIX platforms. However, on a UNIX platform there is no way to tell if a file is a file produced by UNIX file or, say, a a file produced by MS-DOS. (A line of text in UNIX can legitimately contain a ^M character.) If you know you are reading an MS-DOS file, you can remove the last character of every line read using any number of techniques. trim(read(),`\^M`) probably is safe for reading MS-DOS files. Ralph Griswold / Department of Computer Science The University of Arizona / Tucson, AZ 85721 ralph@cs.arizona.edu / uunet!arizona!ralph voice: 602-621-6609 / fax: 602-621-9618 From icon-group-request Wed Dec 18 15:24:57 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 18 Dec 91 15:24:57 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA23223; Wed, 18 Dec 91 15:24:50 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA13775; Wed, 18 Dec 91 14:07:26 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 13 Dec 91 16:52:27 GMT From: csus.edu!wupost!sdd.hp.com!caen!uvaarpa!murdoch!aemsun.med.Virginia.EDU!sdm7g@ucdavis.ucdavis.edu (Steven D. Majewski) Organization: University of Virginia - Physiology Dept. Subject: Bug in sort() with list of mixed size integers ? Message-Id: <1991Dec13.165227.17369@murdoch.acc.Virginia.EDU> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu There seems to be a problem ( at least with Icon v8 on Sparc/SunOS ) with sorting mixed large and small integer lists. Can anyone reproduce this bug on another architecture? ANY COMMENTS? ------------- program: ------------- procedure main( ARGL ) write ( &version ) system( "head -1 /etc/motd" ) bases := [] if ( *ARGL < 1) then bases := [10] every b := !ARGL do bases |||:= [ numeric( b ) ] L := [] every n := ( 0 to 20 ) do every b := !bases do L |||:= [ b^n ] every x := !L do write( x ) write( "######## sort #######" ) L := sort( L ) every x := !L do write( type(x), " : ", x ) end -------------- output: errtest -------------- Icon Version 8.0. May 7, 1990 SunOS Release 4.1.1 (AEMSUN_KNL) #1: Wed Sep 4 19:17:09 EDT 1991 0 10 100 1000 10000 100000 1000000 10000000 100000000 1000000000 10000000000 100000000000 1000000000000 10000000000000 100000000000000 1000000000000000 10000000000000000 100000000000000000 1000000000000000000 10000000000000000000 100000000000000000000 ######## sort ####### integer : 0 integer : 10 integer : 100 integer : 1000 integer : 10000 integer : 100000 integer : 1000000 integer : 10000000 integer : 10000000000 integer : 100000000000 integer : 1000000000000 integer : 10000000000000 integer : 100000000000000 integer : 1000000000000000 integer : 10000000000000000 integer : 100000000000000000 integer : 1000000000000000000 integer : 10000000000000000000 integer : 100000000000000000000 integer : 100000000 integer : 1000000000 ------------- output: errtest 2 3 ------------- Icon Version 8.0. May 7, 1990 SunOS Release 4.1.1 (AEMSUN_KNL) #1: Wed Sep 4 19:17:09 EDT 1991 0 0 2 3 4 9 8 27 16 81 32 243 64 729 128 2187 256 6561 512 19683 1024 59049 2048 177147 4096 531441 8192 1594323 16384 4782969 32768 14348907 65536 43046721 131072 129140163 262144 387420489 524288 1162261467 1048576 3486784401 ######## sort ####### integer : 0 integer : 0 integer : 2 integer : 3 integer : 4 integer : 8 integer : 9 integer : 16 integer : 27 integer : 32 integer : 64 integer : 81 integer : 128 integer : 243 integer : 256 integer : 512 integer : 729 integer : 1024 integer : 2048 integer : 2187 integer : 4096 integer : 6561 integer : 8192 integer : 16384 integer : 19683 integer : 32768 integer : 59049 integer : 65536 integer : 131072 integer : 177147 integer : 262144 integer : 3486784401 integer : 524288 integer : 531441 integer : 1048576 integer : 1594323 integer : 4782969 integer : 14348907 integer : 43046721 integer : 129140163 integer : 387420489 integer : 1162261467 ------------ Note: The problem occurs at or about the boundary between 2^30 and 2^31 So I assmue this a a problem with sorting the two different representations of integers. ( I stuck the type(x) in there just to check that they were all actually integers. ) [ Sorry to have dropped out of the theoretical discussions, re: icon futures, etc. But I decided to hold off on further comments until I am a bit more familiar with Icon/Idol programming. With Richard Goerwitz and others help I can now write a working program or two, so I have been tied up with actually trying to get some work done. - Thanks ] ======== "If you have a hammer, find a nail" - George Bush,'91 ========= Steven D. Majewski University of Virginia Physiology Dept. sdm7g@Virginia.EDU Box 449 Health Sciences Center Voice: (804)-982-0831 1600 Jefferson Park Avenue FAX: (804)-982-1616 Charlottesville, VA 22908 From icon-group-request Wed Dec 18 20:37:23 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 18 Dec 91 20:37:23 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA05311; Wed, 18 Dec 91 20:37:19 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA23282; Wed, 18 Dec 91 19:25:21 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 14 Dec 91 01:54:52 GMT From: world!ksr!tim@uunet.uu.net (Tim Peters) Organization: Kendall Square Research Corp. Subject: Re: Bug in sort() with list of mixed size integers ? Message-Id: <7872@ksr.com> References: <1991Dec13.165227.17369@murdoch.acc.Virginia.EDU> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <1991Dec13.165227.17369@murdoch.acc.Virginia.EDU> sdm7g@aemsun.med.Virginia.EDU (Steven D. Majewski) writes: >There seems to be a problem ( at least with Icon v8 on Sparc/SunOS ) >with sorting mixed large and small integer lists. > >Can anyone reproduce this bug on another architecture? > ...[and a failing program is attached]... Steve, ya, there are some bugs in V8 bigint support. You may have noticed that n^0 is coming out as 0 instead of as 1 in your output -- that's another bug (at least according to me ). Don't know whether they're universal; have reported a few before but nobody bit the bait; the system I'm running is pretty much like yours: Icon Version 8.0. May 7, 1990 OS/SMP 4.0Da_Export (XD/root) #0: Wed Aug 21 08:06:46 1991 That's a Solbourne; another SPARC and running a Sun OS. Try replacing your L := sort( L ) by every !L *:= -1 # negate each number in list L := sort( L ) every !L *:= -1 # and restore This doesn't work any better, but does lead to a prettier pattern in the output (which would be strictly decreasing if it worked right). It also leads to to a workaround that will do the trick until Icon is fixed: huge := 10^10 # must be big enough to make this true: every !L +:= huge # force every number to be a bignum L := sort( L ) every !L -:= huge # & restore original values You already made the key observation for understanding why this works: the bugs in Icon's V8 bignums (& bignums were new in V8, so it's not surprising there are some bugs ...) seem to have mostly to do with the boundary between native integers & the huge ones. So as a general rule, when bit by one of these guys a good approach is to move the data away from the boundary, at least for the duration of the flaky operation. >ANY COMMENTS? No, but a handy tip : You'll eventually discover that building a list via repeated list concatenation (|||) of a singleton is very slow. It's much faster (and, believe it or not, in time much clearer) to avoid the temp variables and build lists via "put". E.g., every put( bases, numeric(!ARGL) ) instead of every b := !ARGL do bases |||:= [ numeric( b ) ] Similarly you'll (eventually) find every n := 0 to 20 & put( L, (!bases)^n ) # marginal, I admit every n := 0 to 20 do every put( L, (!bases)^n ) # clearer(?) and every write( !L ) faster & clearer than the way you're writing them now. if-it's-any-consolation-perl-has-a-hundred-bugs-for-every-one-i've- found-in-icon<0.3-grin>-ly y'rs - tim Tim Peters Kendall Square Research Corp tim@ksr.com, ksr!tim@uunet.uu.net From icon-group-request Sat Dec 21 20:37:58 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 21 Dec 91 20:37:58 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA01463; Sat, 21 Dec 91 20:37:56 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA19536; Sat, 21 Dec 91 19:31:07 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 21 Dec 91 03:22:27 GMT From: world!ksr!tim@decwrl.dec.com (Tim Peters) Organization: Kendall Square Research Corp. Subject: Re: Memory Mangement Message-Id: <8080@ksr.com> References: <1991Dec19.015629.3966@utagraph.uta.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <1991Dec19.015629.3966@utagraph.uta.edu> b912dieg@utarlg.uta.edu writes: >... >I know from my readings that the ICON system finds it necessary to >perform frequent garbage collections to maintain an adequate supply of >unfragmented memory. Guess I'll have to do until a real expert comes along . The frequency of Icon garbage collection depends on an awful lot of things. In general, I'd say that if Icon *is* doing collections frequently, there's an easily-changed (if not easily-found ...) way to stop that by rewriting a bit of the Icon application. For most Icon programs most times, garbage collection should not be a problem; indeed, for many real Icon programs garbage collection is never needed at all. > ... > Mathematica does memory management with reference counts, > so that pieces of memory are freed as soon as they stop > being used. This means that Mathematica can make use of > essentially all the memory that is available on a > particular computer, without the need for operations such > as garbage collection. (Mathematica by Wolfram 1988:xvi) > >Does anyone have an explanation of reference counts? >Can the use of reference counts really eliminate garbage collection? Dynamic memory management has become a rather large field by now, but the intro in Knuth's "Art of Computer Programming" (Volume 1) remains helpful. Assuming you can read Icon, here's a fragement to help explain the differences: B := [2] A := [1, b, 3] # === [1, B === pointer to [2], 3] B := &null A := &null I'll call those lines #1, #2, #3 and #4. Icon-style garbage collection works by (in effect) putting a mark on all the memory you can possibly reference, then reusing all the memory that is *not* marked (it's trickier than that, of course, but I think that's a fair summary of the basic idea). So if garbage collection (GC) occurs right after line #2, the memory earlier allocated for the lists A and B will get marked, because you could still reference that memory in the future just by mentioning A or B. If GC occurs after #3, we must still keep all the memory around, but for a subtler reason: even though you can't reference the [2] list directly via B any more, the garbage collector knows you might still reference A, and the value of A still contains a pointer to the list that used to be named by B. So all the allocated memory is still potentially reachable, so the garbage collector can't reuse it. GC must in general not only mark all the memory reachable in "one step", but all the memory reachable from that in turn, and so on & so on. If GC occurs after #4, of course neither of the earlier-allocated lists are reachable, so the memory they occupy won't get "marked", so that memory will get reused. The general behavior is that "the system" doesn't worry about memory until it runs out of it, at which point it stops everything else to find & recycle the old memory that's no longer being used. This can take an appreciable amount of time when it happens, and of course the collector is itself a program that needs memory for its own use. That's the likely source of the (misleading; see below) claim that Mathematica can use "essentially all" the memory for real stuff. Reference counting (RC) is a different technique. Under RC, each piece of allocated memory keeps track of how many times it's pointed *at* by other pieces of memory; when this count drops to 0, the memory can be reused (that the count is 0 *means* that it's not being pointed at). So in executing line #1, a piece of data is attached to the [2] list that records that the [2] list is pointed at once (by the program variable A). In executing line #2, the "reference count" for the [2] list must be bumped up by 1 because the [2] list is again referenced by a pointer embedded in the A list. In addition, a RC of 1 must be attached to the list assigned to A. In line #3, the system sees that B is being changed to point at a different chunk of memory, so the RC of what it used to point at (the [2] list) must be decreased by 1. This will decrease it from 2 to 1. Since it's 1, it's still being pointed at by something, so the system can't reuse it. Similarly, in line #4 the reference count of the A list is decreased from 1 to 0. Aha! Since it's 0, nobody else is referencing it, so the system can reuse it. Since it will be reused, the reference counts of everything *it* points at must also be decreased by 1, so the system has to examine the A list carefully to find all the things it points at. It will find that it's pointing to the [2] list, so will decrease the [2] list's RC by 1. That in turn decrease's the [2] list's RC to 0, so that's no longer used either. The system then searches thru the [2] list in order to find anything *it* may be referencing, but doesn't find another pointer so the process stops there. There are several things to note about this: (1) It takes memory to store the reference counts. So a claim that an RC approach lets you use "almost all" the available memory for "real stuff" is misleading. (2) Like Icon-flavor GC, RC also needs to recursively traverse your data structures looking for contained pointers. (3) It takes time to increase and decrease the counts. (4) While Icon-flavor GC lumps all the work together at one time, RC-based systems tend to spread the work out over time so that memory-management "hiccups" (pauses) aren't noticeable. (5) An RC-based system does all the work of maintaining the counts whether or not you eventually run out of memory; so an Icon- flavor GC gets off cheaper in the common case where the initial memory region suffices. Dynamic memory management is a very involved topic, and the stuff above conveniently ignored almost all the interesting problems & possible workarounds in both systems. They both face real problems in practice! As another complication, almost all large programs (like Icon, and probably Mathematica too) use several different strategies (e.g., Icon uses a different scheme for strings than it uses for co-expressions, and so on). I just hoped to convey enough of the essentials to make you skeptical of marketing hype . from-what-i-know-of-them-icon's-strategies-are-very-intelligent-choices- for-icon's-needs-and-i'd-be-real-surprised-if-mathematica's-choices- weren't-intelligent-for-its-needs-ly y'rs - tim Tim Peters Kendall Square Research Corp tim@ksr.com, ksr!tim@uunet.uu.net From icon-group-request Sat Dec 21 21:07:55 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sat, 21 Dec 91 21:07:55 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA02121; Sat, 21 Dec 91 21:07:51 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA20805; Sat, 21 Dec 91 19:58:58 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 21 Dec 91 04:52:15 GMT From: news@psuvax1.cs.psu.edu (Felix Lee) Organization: Penn State Computer Science Subject: Re: Memory Mangement Message-Id: References: <1991Dec19.015629.3966@utagraph.uta.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu > Mathematica does memory management with reference counts, > so that pieces of memory are freed as soon as they stop > being used. This means that Mathematica can make use of > essentially all the memory that is available on a > particular computer, without the need for operations such > as garbage collection. (Mathematica by Wolfram 1988:xvi) This is called "propaganda": condensing many subtle issues into a sweeping, positive-image statement. The memory management problem is this: if X used to refer to A but now refers to B, you can reuse the memory occupied by A, but only if noone else is using A. Reference counting (RC) is giving each object a count of how many things are using it. When you assign X = A, you increase A's count by one. When you later assign X = B, you decrease A's count, and increase B's count. When the count for something reaches zero, you know that noone is using it, so you can reclaim it. The main flaw is this does not handle circular references. Say that A refers to B and B refers to A, but neither A nor B is being used any more. In principle you can reclaim both A and B, but RC cannot discover this fact easily. This isn't a problem if you can't create recursive structures. Perhaps Mathematica forbids them. But these types of structures are quite natural and common in languages like Icon and Lisp. Garbage collection (GC) strategies are more general than RC. Rather than continually doing RC bookkeeping, you periodically discover which objects are in use and reclaim everything else. There are several different strategies for this, with different contraints and characteristics. I'm not going to describe them here. The main flaw with GC is it has a bad image. At apparently random times, the system may pause for several seconds to do GC. But modern GC systems tend to behave well, and there's still plenty of active research into improving GC in different ways. There is good evidence that, overall, GC is more efficient than RC. All the bookkeeping involved in RC doesn't come free. However, the costs get amortized over all operations, giving you a more predictable response time. This is attractive for interactive and real-time use, as long as you can ignore the circular-reference problem. Incremental and concurrent GC strategies may be viable alternatives. Memory fragmentation is a related issue. If you allocate variably sized pieces of memory, then the available memory in the system tends to become scattered into many small pieces over the course of many allocations and deallocations. Now say you need to allocate 16 bytes. There may be more than 1600 bytes of memory available, but you won't be able to use any of it if all the pieces are smaller than 16 bytes. RC is not enough to ensure you can actually use "all the memory that is available on a particular computer". Pathological fragmentation is unavoidable in general unless you do garbage compaction. This involves moving all the objects in memory to collect the available memory into a single area. This suffers from all the performance objections that apply to GC. Note, however, that with some types of GC you get compaction for free. Now, try to condense all this into a simple blurb. "Mathematica uses reference counting instead of garbage collection because it seemed like a good idea at the time." Not very marketroidish... -- Felix Lee flee@cs.psu.edu From NED@hmcvax.claremont.edu Sun Dec 22 06:29:17 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 22 Dec 91 06:29:17 MST Received: from CBROWN.CLAREMONT.EDU by optima.cs.arizona.edu (4.1/15) id AA16944; Sun, 22 Dec 91 06:29:13 MST Received: from HMCVAX.CLAREMONT.EDU by HMCVAX.CLAREMONT.EDU (PMDF #11000) id <01GEEHX9YKR69N3W7T@HMCVAX.CLAREMONT.EDU>; Sun, 22 Dec 1991 05:28 PST Date: Sun, 22 Dec 1991 05:28 PST From: "Ned Freed, Postmaster" Subject: Re: Memory Mangement To: news@psuvax1.cs.psu.edu Cc: icon-group@cs.arizona.edu Message-Id: <01GEEHX9YKR69N3W7T@HMCVAX.CLAREMONT.EDU> X-Vms-To: IN%"news@psuvax1.cs.psu.edu" X-Vms-Cc: IN%"icon-group@cs.arizona.edu" > > Mathematica does memory management with reference counts, > > so that pieces of memory are freed as soon as they stop > > being used. This means that Mathematica can make use of > > essentially all the memory that is available on a > > particular computer, without the need for operations such > > as garbage collection. (Mathematica by Wolfram 1988:xvi) > This is called "propaganda": condensing many subtle issues into a > sweeping, positive-image statement. This is indeed propaganda (the book is loaded with similar material), and other postings have pointed out the inherent flaws in reference counts and how they do not provide a viable general solution for Icon (and more generally for LISP as well as lots of other language runtime facilities). However, the problem presented by languages like Mathematica that only deal with a _very_ limited subset of the universe of data structures and this subset is in fact quite amenable to the use of reference counts. For the most part, algebraic languages only have to deal with trees, and very simple trees at that. Insertion of reference counts in every node of the tree is gross overkill -- counts are only needed in the case of common subexpressions and (possibly) in the case of cross-function references (this depends on the characteristics of the language itself). This simplifies the problem greatly and for this simplified problem reference counts are a very attractive solution. It is even possible to deal with functions that reference each other recursively in an clean and elegant fashion, since despite appearances to the contrary these are _not_ really circular data structures (this is because of the shallow name binding properties of most algebra languages). Property lists are actually a somewhat tricker problem, but even they can be dealt with nicely. In my own algebraic language work I've found that reference counts were an easy and obvious solution to many memory management problems. The overhead is quite low and the performance is usually excellent. When coupled with lookaside lists and a few other tricky little low-level goodies like block allocation it is easy to put together code where memory reclamation is so efficient that its effect on bottom-line performance is insignificant both in terms of time and space used. (Of course, once the problem gets large the positive benefits of reclamation can be overwhelming, especially on virtual memory systems.) But this is just your average competent design, that's all. I certainly don't go around tooting my horn in print about the reference count subsystem that I employed in MATHLIB (another algebra language), which was implemented long before the first line of Mathematica was ever written. I didn't think it was particularly impressive when I wrote it, and I don't think it is impressive now. It is just competent design, and certainly not the first such design either. However, you should keep in mind the characteristics of some other algebra systems. They make Mathematica look a lot better than it actually is. Those that run inside of more general environments (usually LISP) can scarcely be blamed for using reclamation strategies that don't precisely match the characteristics of the algebraic data structures these programs use, but that's the price you pay for using such a general environment. A more interesting example is the memory management of Wolfram's earler opus, SMP. It is quite simply terrible. I won't go into details -- I don't like to recall them. Wolfram et al did get it right the second time around, which I suppose is something, but he really blew it big-time the first time. This is probably not what all the Wolfram-o-philes want to hear... > Now, try to condense all this into a simple blurb. "Mathematica uses > reference counting instead of garbage collection because it seemed > like a good idea at the time." Not very marketroidish... I'd prefer to say that "Mathematica uses reference counting because it is the obvious preferred solution to the problem of memory reclamation given the nature of the language." I also find the strategies employed in Icon to be extremely interesting and in several places amazingly clever, and I have employed a number of Icon's tricks in my own work. (The implementation of coexpressions is the part I like the most.) I actually use this and similar information about Icon implementation details a bit more than I actually use the Icon itself! So, if anyone has any hesitations about discussing such matters, please note that some of the readership of this list is very interested in these things. Ned From icon-group-request Sun Dec 22 11:40:41 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 22 Dec 91 11:40:41 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA22223; Sun, 22 Dec 91 11:40:39 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA00634; Sun, 22 Dec 91 10:27:00 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 12 Dec 91 01:56:19 GMT From: ksr!tim@uunet.uu.net (Tim Peters) Organization: Kendall Square Research Corp. Subject: Re: novice scanning question Message-Id: <7803@ksr.com> References: <9112111353.AA20684@relay1.UU.NET> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <9112111353.AA20684@relay1.UU.NET> nowlin@isidev.UUCP writes: > > > From: uunet!arizona!decwrl.dec.com!world!ksr!tim (Tim Peters) > > ... > > line ?:= { move(3) & move(4) } > > vs > > line ?:= ( move(3) & move(4) ) > > vs > > line ?:= { move(3); move(4) } > > ... >It's most idiomatic (I think) to write: > > procedure main() > line := "Now and then" > line ?:= ( move(3) , move(4) ) > write(line) > end > >This is called mutual evaluation and while it's usually used for long >sequences of compound conjunction operations it works just fine with >only two expressions. It was an artificial example so it's hard to tell what "the natural" approach would be, but I got the impression that mutual evaluation (whether spelled as "e1 & e2" or "(e1, e2)") was not a natural approach to the poster's real task. I.e., do you really want to suck a self- proclaimed "novice" into the mysteries of backtracking ? Anyway, I steered him toward the quite different (semantically as well as syntactically) "{ e1; e2; ... }" form believing that it's less confusing for an Icon newbie -- works pretty much the way sequential blocks in other languages work. Whatever, I'm glad you drew explicit attention to mutual evaluation, Jerry; could well be what he's really looking for. consulting-via-telepathy-has-its-limitations-ly y'rs - tim Tim Peters Kendall Square Research Corp tim@ksr.com, ksr!tim@uunet.uu.net From alfred@adt.uni-paderborn.de Mon Dec 23 05:22:38 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 23 Dec 91 05:22:38 MST Received: from pbinfo.uni-paderborn.de ([131.234.2.3]) by optima.cs.arizona.edu (4.1/15) id AA19179; Mon, 23 Dec 91 05:21:07 MST Received: from alpha.uni-paderborn.de ([131.234.128.175]) by pbinfo.uni-paderborn.de with SMTP (5.65c8/PB-3.100+yp+master) id AA26793; Mon, 23 Dec 1991 13:19:19 +0100 Received: by alpha.uni-paderborn.de (5.61++/PB-3.41) id AA06719; Mon, 23 Dec 91 13:18:38 -0100 From: Alfred Schmidt Message-Id: <9112231418.AA06719@alpha.uni-paderborn.de> Subject: Xmas Greetings from Germany To: icon-group@cs.arizona.edu Date: Mon, 23 Dec 91 13:18:35 MEZ X-Mailer: ELM [version 2.3 PL10] MERRY XMAS FROM GERMANY and a happy new year. Thanks to the whole ICON community for the fruitful hints and the very special support. Icon User Group Paderborn/Germany +------------------------------------------------------------------------+ | Alfred Schmidt The University of Paderborn | | Systems Engineer Dept. EEE | | voice: +49 5251 60 3279 Section Software Engineering | | fax : +49 5251 60 3246 PO Box 1621 | | email: alfred@adt.uni-paderborn.de D-4790 Paderborn/Germany | +------------------------------------------------------------------------+ From TENAGLIA@mis.mcw.edu Mon Dec 23 11:24:05 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 23 Dec 91 11:24:05 MST Received: from MIS4.MIS.MCW.EDU by optima.cs.arizona.edu (4.1/15) id AA28098; Mon, 23 Dec 91 11:23:56 MST Received: from mis.mcw.edu by mis.mcw.edu (PMDF #12252) id <01GEGASK9M5S934S7K@mis.mcw.edu>; Mon, 23 Dec 1991 12:25 CST Date: Mon, 23 Dec 1991 12:25 CST From: Chris Tenaglia - 257-8765 Subject: Holiday Offering, Thanksgiving leftovers To: icon-group@cs.arizona.edu Message-Id: <01GEGASK9M5S934S7K@mis.mcw.edu> X-Organization: Medical College of Wisconsin (Milwaukee, WI) X-Vms-To: IN%"icon-group@cs.arizona.edu" The response on the brain-dead, cheating tic-tac-toe game offering of the Thanksgiving holiday has been underwhelming. Oh well, I promised a smarter version the next time. Here it is. It uses a strategy list. It's not very elegant, always starts first (and in the same spot), and always ends in a draw or computer winning. At least it plays fair and has a few more comments. I've tested it under VMS and Unix (ultrix) using VT compatible emulation. The next step would be to have it start in a random corner (maintain several more strategy lists). The next step after that would be to allow the player to start (a lot more work). I threw this together in a quiet evening and a lunch hour. Enjoy! Chris Tenaglia (System Manager) | Medical College of Wisconsin 8701 W. Watertown Plank Rd. | Milwaukee, WI 53226 (414)257-8765 | tenaglia@mis.mcw.edu, mcwmis!tenaglia ############################################################### # # # file : tt2.icn # # updt : 23-dec-1991 # # auth : chris tenaglia # # desc : tictactoe icon implementation # # note : This version plays fair, it uses a strategy # # but isn't very creative. # # # ############################################################### global me,you,true,false,draw,pointer,wins,pass,taken,winner global mark,row,routes,route procedure main() init() play := true while play == true do { me := set() # computer is me you := set() # player is you victory := "" # nobodys' won yet winner := "" # winner flag pass := 0 # start flag taken := table(false) # taken position table (rather than set?) display() # # computer makes first move # insert(me,1) taken[1] := true display() # # player follows # insert(you,(tmp := integer(get_your_move()))) taken[integer(tmp)] := true display() path := routes[tmp] # players' move determines strategy index := 2 # points at 2nd move just happened # # computers' next move determined from strategy list # insert(me,(tmp := integer(path[(index+:=1)]))) taken[tmp] := true display() # # player follows # insert(you,(tmp := integer(get_your_move()))) taken[integer(tmp)] := true your_last_move := tmp display() # # if didn't take position dictated, loss ensues # if your_last_move ~= (tmp := integer(path[(index+:=1)])) then { winner := "me" insert(me,tmp) taken[tmp] := true display() done_yet() write(at(1,22),chop(&host)," Wins, You Loose!") every square := !row do writes(pointer[square],mark) again := map(input(at(1,23) || "Another game? Y/N :"))[1] if again=="y" then next stop(at(1,23),"Game Over.",chop()) } # # user made a good move, continue (computer plays now) # insert(me,(tmp := integer(path[(index+:=1)]))) taken[tmp] := true display() # # player follows # insert(you,(tmp := integer(get_your_move()))) taken[integer(tmp)] := true your_last_move := tmp display() # # if didn't take position dictated, loss ensues # if your_last_move ~= (tmp := integer(path[(index+:=1)])) then { winner := "me" insert(me,tmp) taken[tmp] := true display() done_yet() write(at(1,22),chop(&host)," Wins, You Loose!") every square := !row do writes(pointer[square],mark) again := map(input(at(1,23) || "Another game? Y/N :"))[1] if again=="y" then next stop(at(1,23),"Game Over.",chop()) } # # if players first move wasn't 5, they lose now too # if integer(path[2]) ~= 5 then { tmp := integer(path[(index+:=1)]) winner := "me" insert(me,tmp) taken[tmp] := true display() done_yet() write(at(1,22),chop(&host)," Wins, You Loose!") every square := !row do writes(pointer[square],mark) again := map(input(at(1,23) || "Another game? Y/N :"))[1] if again=="y" then next stop(at(1,23),"Game Over.",chop()) } # # user made a good move, continue (computer plays now) # insert(me,(tmp := integer(path[(index+:=1)]))) taken[tmp] := true display() write(at(1,22),chop(),"Game was a draw.") again := map(input(at(1,23) || "Another game? Y/N :"))[1] if again=="y" then next stop(at(1,23),"Game Over.",chop()) } end # # # # procedure to display the current tictactoe grid and plays # procedure display() if (pass +:= 1) = 1 then { write(cls(),uhalf()," T I C - T A C - T O E") write(lhalf()," T I C - T A C - T O E") write(trim(center("Computer is 'O' and you are 'X'",80))) line := repl("q",60) ; line[21] := "n" ; line[41] := "n" every y := 5 to 20 do writes(at(30,y),graf("x")) every y := 5 to 20 do writes(at(50,y),graf("x")) writes(at(10,10),graf(line)) writes(at(10,15),graf(line)) every x := 1 to 9 do writes(pointer[x],dim(x)) } every writes(pointer[!me],high("O")) every writes(pointer[!you],under("X")) end # # procedure to obtain a move choice from the player # procedure get_your_move() local yours,all_moves repeat { writes(at(5,22)) yours := input("Enter block # (1-9) :") writes(at(5,23),chop()) if not(integer(yours)) then { writes(at(5,23),beep(),"Invalid Input! Choose 1-9.") next } if (1 > yours) | (yours > 9) then { writes(at(5,23),beep(),"Value out of range! Choose 1-9.") next } if taken[integer(yours)] == true then { writes(at(5,23),beep(),"That position is already taken! Try again.") next } break } return integer(yours) end # # procedure to test if computer has won, or the game is a draw # procedure done_yet() every outcome := !wins do { test := 0 every part := !outcome do if member(you,part) then test +:= 1 if test = 3 then { winner := "you" row := outcome mark := high(blink("X")) return true } } every outcome := !wins do { test := 0 every part := !outcome do if member(me,part) then test +:= 1 if test = 3 then { winner := "me" row := outcome mark := high(blink("O")) return true } } if *me + *you > 8 then { winner := draw return draw } return "not done yet" end # # # # prompts for an input from the user # procedure input(prompt) writes(prompt) return read() end # # # # procedures to output ansi graphics and attributes # procedure at(x,y) return "\e[" || y || ";" || x || "f" end procedure graf(str) return "\e(0" || str || "\e(B" end procedure uhalf(str) /str := "" return "\e#3" || str end procedure lhalf(str) /str := "" return "\e#4" || str end procedure high(str) return "\e[1m" || str || "\e[0m" end procedure normal(str) return "\e[0m" || str end procedure dim(str) return "\e[2m" || str || "\e[0m" end procedure under(str) return "\e[4m" || str || "\e[0m" end procedure blink(str) return "\e[5m" || str || "\e[0m" end procedure cls(str) /str := "" return "\e[2J\e[H" || str end procedure chop(str) /str := "" return "\e[J" || str end procedure beep() return "\7" end # # # # procedure to init useful global variables for later use # procedure init() true := "y" false := "n" draw := "?" &random := map(&clock,":","0") routes := ["-","1274958","1374958","1432956","1547328", "1632745","1732956","1874352","1974352"] wins := [set([1,5,9]),set([3,5,7]),set([1,2,3]),set([4,5,6]), set([7,8,9]),set([1,4,7]),set([2,5,8]),set([3,6,9])] pointer := [at(17,7), at(37,7), at(57,7), at(17,12),at(37,12),at(57,12), at(17,17),at(37,17),at(57,17)] end From icon-group-request Mon Dec 23 11:56:24 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 23 Dec 91 11:56:24 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA29106; Mon, 23 Dec 91 11:56:23 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA28850; Mon, 23 Dec 91 10:48:39 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 19 Dec 91 01:44:22 GMT From: mips.mitek.com!utacfd.uta.edu!utagraph.uta.edu!utarlg.uta.edu!b912dieg@apple.com (DOUG WITMER) Organization: The University of Texas at Arlington Subject: Memory Mangement Message-Id: <1991Dec19.015629.3966@utagraph.uta.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu This question is directed at those having an in depth knowledge of ICON internals and in particular ICON memory management. I know from my readings that the ICON system finds it necessary to perform frequent garbage collections to maintain an adequate supply of unfragmented memory. I also know that ICON is implemented in the C language. My question stems from a claim which I read today concerning the Mathematica language which is also implemented in C. Here is what they say about garbage collection: Mathematica does memory management with reference counts, so that pieces of memory are freed as soon as they stop being used. This means that Mathematica can make use of essentially all the memory that is available on a particular computer, without the need for operations such as garbage collection. (Mathematica by Wolfram 1988:xvi) Does anyone have an explanation of reference counts? Can the use of reference counts really eliminate garbage collection? Thanks for helping satisfy my curiosity. Doug Witmer b912dieg@utarlg.uta.edu From wgg@cs.ucsd.edu Wed Dec 25 00:38:24 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Wed, 25 Dec 91 00:38:24 MST Received: from relay2.UU.NET by optima.cs.arizona.edu (4.1/15) id AA21051; Wed, 25 Dec 91 00:38:34 MST Received: from ucsd.edu by relay2.UU.NET with SMTP (5.61/UUNET-internet-primary) id AA01643; Wed, 25 Dec 91 02:38:33 -0500 Received: from gremlin.ucsd.edu by ucsd.edu; id AA11078 sendmail 5.64/UCSD-2.2-sun via SMTP Tue, 24 Dec 91 23:38:31 -0800 for isidev!nowlin@uunet.uu.net Received: by gremlin.ucsd.edu (4.1/UCSDPSEUDO.4) id AA13508 for uunet!cs.arizona.edu!icon-group@uunet.uu.net; Tue, 24 Dec 91 23:38:29 PST Date: Tue, 24 Dec 91 23:38:29 PST From: wgg@cs.ucsd.edu (William Griswold) Message-Id: <9112250738.AA13508@gremlin.ucsd.edu> To: isidev!nowlin@uunet.uu.net, cs.arizona.edu!icon-group@uunet.uu.net Subject: Re: tic-tac-toe Several years ago, as a novice Icon programmer, I wrote a learning tic-tac-toe player that sounds similar to Jerry's. This player was so `dumb' that it didn't even know what a winning board was! Tic-tac-toe was the domain we chose to test a learning algorithm based on creating general knowledge from specific examples. The player discovered this general knowledge by `intersecting' boards to create fuzzy board patterns. A fuzzy pattern consisted of X's, O's, empty squares, and Don't Cares. When two boards were combined to make a pattern, squares that exactly matched retained their value in the new board, but others became don't care. Of course, when we intersected two boards, we also combined their win-loss-tie histories, which guided their use during play. To make things interesting, we limited the pattern space (``memory'') to be less than the number of possible boards, and then let the board patterns `compete' for the available space. Like Jerry's player, ours learned faster against good opponents than bad ones. Bill Griswold From icon-group-request Sun Dec 29 10:55:22 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 29 Dec 91 10:55:22 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA05432; Sun, 29 Dec 91 10:55:20 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA14218; Sun, 29 Dec 91 09:41:46 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 10 Dec 91 21:31:19 GMT From: mailer.cc.fsu.edu!sun13!sun8.scri.fsu.edu@gatech.edu (John Nall) Organization: SCRI, Florida State University Subject: String Scanning Question (from novice) Message-Id: <6030@sun13.scri.fsu.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu I had a bug in a program, and finally found it. But although I can fix it easily, I still don't understand what I did wrong. The book, page 32, says that the form of a string-scanning expression is: expr1 ? expr2 It also says (page 38) that an augmented assignment of the form: s ?:= expr "can be used to scan s and assign a new value to it as a result. The value assigned is the value produced by expr". The book also says (page 18) that when the conjunction expr1 & expr2 is evaluated, it produces the value of expr2 (assuming both succeed). The following little nonsense program illustrates the problem. As I understand, the expression "line ?:= move(3) & move(4)" should result in line being assigned the value resulting from the "move(4)" part. But it does not. Instead it assigns the value resulting from the "move(3)" part instead. (My bug, by the way :-) ) procedure main() line := "Now and then" line ?:= move(3) & move(4) write(line) # writes only "Now" end BUT...(the fix) if I do it like this, it works ok: procedure main() line := "Now and then" line ?:= { move(3) & move(4) } write(line) # writes " and" as it is supposed to end So is the book wrong? Or am I just misunderstanding what it says? Thanks -- John W. Nall | Supercomputer Computations Research Institute nall@sun8.scri.fsu.edu | Florida State University, Tallahassee, FL 32306 (904)-644-6008 | "Down with liberals. Down with conservatives." From ralph Sun Dec 29 11:07:51 1991 Date: Sun, 29 Dec 91 11:07:51 MST From: "Ralph Griswold" Message-Id: <9112291807.AA03934@cheltenham.cs.arizona.edu> Received: by cheltenham.cs.arizona.edu; Sun, 29 Dec 91 11:07:51 MST To: icon-group@cs.arizona.edu, mailer.cc.fsu.edu!sun13!sun8.scri.fsu.edu@gatech.edu Subject: Re: String Scanning Question (from novice) Your problem is precedence. The conjunction operation, &, has the lowest precedence of all infix operations. Therefore, e1 ? e2 & e3 groups as (e1 ? e2) & e3 See page 40 of the second edition of the Icon book. Ralph Griswold / Department of Computer Science The University of Arizona / Tucson, AZ 85721 ralph@cs.arizona.edu / uunet!arizona!ralph voice: 602-621-6609 / fax: 602-621-9618 From icon-group-request Sun Dec 29 15:43:24 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 29 Dec 91 15:43:24 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA17942; Sun, 29 Dec 91 15:43:22 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA27485; Sun, 29 Dec 91 14:28:48 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 10 Dec 91 23:44:43 GMT From: fernwood!cronos!laguna!alex@uunet.uu.net (Bob Alexander) Organization: Metaphor Computer Systems, Mountain View, CA Subject: Re: expanding regions, Mach Message-Id: <1696@cronos.metaphor.com> References: <1991Dec7.170211.22874@midway.uchicago.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <1991Dec7.170211.22874@midway.uchicago.edu> goer@midway.uchicago.edu writes: >Also, how viable would it be simply to have a routine that sat around >and waited for sbrk(> 0) requests, and then (when it encountered them), >just used vm_alloc (or whatever it is you use on a NeXT), obtained a >block of contiguous memory, then did a block copy of the static, block, >and string regions, realigned all the base pointers and what not, and >then freed the old contiguous block (or, if there's a vm_realloc func- >tion, used that)? If I follow your drift, the technique you suggest is pretty much what is done in the Macintosh-MPW version. I didn't put in any code to relocate anything, though, so the region has to be expanded in place using a realloc()-type call in the Mac O/S that promises not to move anything. The problem is, of course, that by the time region expansion is needed, it's likely that other non-relocatable blocks have been allocated by some system service in such a way that the Icon-block can't be expanded in place. I haven't done any tests to see just how often region expansion succeeds or fails, but in general I don't think you can really count on it, and the best bet is to allocate bigger regions beforehand. Obviously, this technique works MUCH better if the Icon memory can be relocated, as Richard suggested. I haven't looked into how hard this would be -- perhaps the garbage collector could do it quite easily. Of course, you still have the problem of needing + to perform the copy. -- Bob Alexander Metaphor Computer Systems (415) 961-3600 x751 alex@metaphor.com ====^=== Mountain View, CA ...{uunet}!{decwrl,apple}!metaphor!alex From icon-group-request Sun Dec 29 16:56:52 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 29 Dec 91 16:56:52 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA19293; Sun, 29 Dec 91 16:56:49 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA01147; Sun, 29 Dec 91 15:43:48 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 11 Dec 91 04:09:26 GMT From: agate!spool.mu.edu!uwm.edu!linac!uchinews!ellis!goer@ucbvax.berkeley.edu (Richard L. Goerwitz) Organization: University of Chicago Computing Organizations Subject: Re: String Scanning Question (from novice) Message-Id: <1991Dec11.040926.11559@midway.uchicago.edu> References: <6030@sun13.scri.fsu.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <6030@sun13.scri.fsu.edu> nall@sun8.scri.fsu.edu (John Nall) writes: > >is evaluated, it produces the value of expr2 (assuming both succeed). > >The following little nonsense program illustrates the problem. >As I understand, the expression "line ?:= move(3) & move(4)" should >result in line being assigned the value resulting from the >"move(4)" part. But it does not. Instead it assigns the value >resulting from the "move(3)" part instead. (My bug, by the way :-) ) > >procedure main() > line := "Now and then" > line ?:= move(3) & move(4) > write(line) # writes only "Now" >end Think of line ?:= move(3) & move(4) as line := line ? move(3) & move(4). The precedences work out like this: (line := (line ? move(3))) & move(4). The result is that line is evaluated, producing a variable, then the scan- ning expression (line ? move(3)) gets evaluated, producing the value of move(3) (if it succeeds), and then assigning that value to the variable produced earlier on. If this whole business succeeds, then move(4) is evaluated. By now we're outside of the scanning expression, and &pos and &subject havewhatever values they had before evaluation of this line be- gan, so the results are irrelevant (and aren't used anyway). I guess the bottom line is that expression1 op:= expression2 always works out as expression1 := expression1 op expression2. >BUT...(the fix) if I do it like this, it works ok: > >procedure main() > line := "Now and then" > line ?:= { move(3) & move(4) } > write(line) # writes " and" as it is supposed to >end Good fix. It's always good to group expressions manually if there could be any confusion about their order of evaluation. Basically what you're doing is forcing move(3) & move(4) into a single expression that gets evaluated within the scanning operation set up by the ?:= operator. You could also just write line ?:= (move(3), move(4)), which to me looks more idiomatic, but does the same thing. If you prefer the other way, that's fine, too. >So is the book wrong? Or am I just misunderstanding what it says? Nobody's wrong. Everybody's right. Merry Christmas! :-) -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer From icon-group-request Sun Dec 29 18:12:34 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Sun, 29 Dec 91 18:12:34 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA20847; Sun, 29 Dec 91 18:12:33 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA05207; Sun, 29 Dec 91 17:08:19 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 11 Dec 91 06:10:08 GMT From: ksr!tim@uunet.uu.net (Tim Peters) Organization: Kendall Square Research Corp. Subject: Re: String Scanning Question (from novice) Message-Id: <7762@ksr.com> References: <6030@sun13.scri.fsu.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu In article <6030@sun13.scri.fsu.edu> nall@sun8.scri.fsu.edu (John Nall) writes: %I had a bug in a program, and finally found it. % ... [this one is yielding unexpected behavior] ... %procedure main() % line := "Now and then" % line ?:= move(3) & move(4) % write(line) # writes only "Now" %end % %BUT...(the fix) if I do it like this, it works ok: % %procedure main() % line := "Now and then" % line ?:= { move(3) & move(4) } % write(line) # writes " and" as it is supposed to %end %... Not to worry, John -- all will be clear! You need to look at the end of Appendix A to get the precedence of the operators straight. Note that in Icon (like as in C or Perl or ... but unlike as in Pascal or Fortran or ...), assignment is a binary operator, much like "+" and "*" and "&". Because it is just another binary operator, the precedence of assignment with respect to the other binary operators is important, and can occasionally lead to surprise. That's what's happening to you above. The table at the end of Appendix A says that all forms of assignment in Icon (whether plain, or the "swap" flavor, or augmented) have the same precedence, and that it's higher (binds more tightly) than the precedence of the "&" operator. In fact, "&" has *the* lowest precedence, which you will appreciate some day (trust me ). So in line ?:= move(3) & move(4) the "?:=" binds more tightly than the "&", so Icon groups it this way: (line ?:= move(3)) & move(4) The "move(4)" is off in outer space somewhere, & has nothing to do with scanning "line"; the attempt to move(4) will actually fail in your program, but failure doesn't generally cause an error msg so you probably didn't realize it. Try this line to see it a bit more clearly: line ?:= move(3) & (move(4) | write("I can't!")) The good news is that Icon is working as documented & that it isn't hard to learn "the rules". The bad news is that experience with other languages will work against you at first (some of your expectations are, well, wrong ). hang-in-there-it's-worth-a-little-initial-discomfort-ly y'rs - tim Tim Peters Kendall Square Research Corp tim@ksr.com, ksr!tim@uunet.uu.net ps: It's probably a bit more idiomatic to write your line ?:= { move(3) & move(4) } as line ?:= ( move(3) & move(4) ) although line ?:= { move(3); move(4) } is a suggestive alternative to think about. From icon-group-request Mon Dec 30 22:39:35 1991 Received: from optima.cs.arizona.edu by cheltenham.cs.arizona.edu; Mon, 30 Dec 91 22:39:35 MST Received: from ucbvax.Berkeley.EDU by optima.cs.arizona.edu (4.1/15) id AA08699; Mon, 30 Dec 91 22:39:27 MST Received: by ucbvax.Berkeley.EDU (5.63/1.43) id AA11821; Mon, 30 Dec 91 21:30:53 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for icon-group@cs.arizona.edu (icon-group@cs.arizona.edu) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 30 Dec 91 22:03:32 GMT From: uchinews!ellis!goer@speedy.wisc.edu (Richard L. Goerwitz) Organization: University of Chicago Computing Organizations Subject: bibleref-2.2 Message-Id: <1991Dec30.220332.4899@midway.uchicago.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu For anyone who wants it, a maintenance update of bibleref has been placed in ~ftp/icon/contrib/bibleref-2.2.tar.Z on cs.arizona.edu. Bibleref is a King James Bible text retrieval program geared for UNIX systems. It comes with a pre-indexed, compressed King James text, but other texts can be indexed as well (e.g. the CCAT RSV text with the Catholic Apocrypha, the Princeton Qur'an, and the Book of Mormon). I've not tried running the program under anything but UNIX. It can't run under DOS because of the 64k executable size-limit, but maybe it would run under VMS - ? Dunno. Anyone wanting additional information, please drop me a line. -- -Richard L. Goerwitz goer%sophist@uchicago.bitnet goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer