############################################################################ # # File: charpatt.icn # # Subject: Program for manipulating character patterns # # Author: Ralph E. Griswold # # Date: September 20, 1998 # ############################################################################ # # This file is in the public domain. # ############################################################################ # # IMPORTANT NOTICE: This program will be superceded by another that # treats pattern forms in a more general manner. The new notation is # not compatible with the one used here. There are no plans for # updating this program; all improvements will be made in the new # one. # # This program allows the user to convert strings into pattern forms and # grammars. Several pattern forms are supported: # # [s,i] s repeated (concatenated) i times # <s> reversal of s # {s1,s2,...} collation of s1, s2, ... # # There are several ways to find pattern forms. Constant strings also # can be specified in several ways. # # Two text lists are provided: a workspace in which the string assoicated # with a variable displayed, and a grammar list. # # For more information, see Icon Analyst 50. # # Note: This program requires UNIX because navitrix does. This # restriction could be lifted by extending navitrix to other platforms # or by bypassing navitrix and using a simple open dialog. # ############################################################################ # # This program is still under development. The pattern-matching portions # are crude and do not yet offer generality. Portions that are strictly ad # hoc are marked as such. # ############################################################################ # # Things to do: # # Fix known bugs: # # Savings listed for constants sometimes are too low. # # Either predicted or actual savings for n-grams are # not always correct. # # @1 should order dialog by token symbol, not value. # # On @A, symbols are exhausted and watch cursor is left # set. (?) # # If saved grammar is reloaded, no check is made of # symbols in use, which may cause tokens to be # erroneously used as variables. :-< # # There may be some range specifications that are not # handled correctly. # # High-priority items: # # Handle blank as lst line in @C as end of file. # # Have @R compute necessary width for result dialog. Possibly # others, too. # # Implement search interrupts at all appropriate places # and give option to stop, continue, or abort. Also # provide specific event for interrupt. # # Increase width of text-entry fields in dialog boxes. # # Invert token_tbl. # # Interpret blank attempts fields in dialogs as unlimited. # # Lower-priority items: # # Add histogram of definition lengths to grammar info. box. # # For options, allow preselection of toggles to be "all", # "first", and, independently, "no split forms". # # On @C, the goal symbol in the data *must* be replaced; else # there will be a generation loop. # # Consider disabling reversals -- at least as an option. # # Support compact grammars, at least as an option. # # Provide a way for user to add reserved symbols. (Needs to # be saved, if so ...). # # @Y should show the tokens used, even if there are no definitions # for them. # # Consider filtering n-grams so that, say, they must contain # specified characters. Note this needs to be done in the # *search*. # # Clean up @H dialog; get rid of unneeded fields and implement # those that aren't. # # @G should allow cancellation before it messes things up. # # Change from remove_symbol(c) to remove_symbols(s). # # min > max should get meaningful notice dialog. # # Add items to Options menu: # # reset dialog defaults # mode of operation: seek structure or compactness # control of reversals # handling of meta-characters in pattern matches (split # forms) # add symbols (high and control characters) # regular vs. compact grammars (affects fom computations) # # Provide grammar depth limit. # # Make size the real size for the file (?) # # When making multiple definitions, make sure all do something. # (?) # # Export workspace as (1) string and (2) character pattern (i.e., # with pattern forms). # # If characters have be tokenized in @C, prompt to save grammar # before quitting. # # Token actions need work. (?) # # Need to put counts back if not already done. # # Consider implementing n-gram search within charpatt. # # Should have range field for locate(). # # Get rid of reverse search and put option on find search. Also # put option on location search. # # Modify location search to give list of all positions and # deltas -- or at least to precompute them. Show number of # occurrences before producing them. Maybe histogram. # # Need better handling of current file name. # # Provide way of handling "high" characters on input. # # Fix ad-hoc pattern-matching code. # # Fix pattern expansion. Figure out what to do with the result. # # Hone bounds on searches. # # Refine "floating string window" in current search code and use # it uniformly. # # Collapse duplicate code. # # Fix n-gram (?) bug that can lead to a vacuous production, as in # # J->Ad # # and then # # J->K # K->Ad # # Put expand() in charpatt.icn and change to handle <A>. # # Be able to look for an entire definition. (?) # # Allow workspace to be resized (downward only). May only # be possible to shorten line in exisiting space. # # Documentation: # # Note that in palindroid AB<A>, if A = <A> (that is, A is # a palindrome), it comes out as ABA. This cannot be avoided; # it's a feature. # # Concepts: # # Have pattern forms in some dialogs expanded to produce string # to search for. # # # Explore concept of subgrammars. # # Consider transformations on character strings, editing. # # See also the comments in the program file, especially those in uppercase # throughout. # ############################################################################ # # A grammar is represented in a file as a pair of L-Systems, which can be # processed by lindsys.icn. The first L-system is for the grammar # proper. The second is a grammar for the tokens, if any. The first # grammar has the same name as the file it is saved in. The second # grammar has that name with ".tok" appended. # # Note: Lines can be arbitrarily long. # ############################################################################ # # Requires: Version 9 graphics, UNIX # ############################################################################ # # Links: expander, interact, io, lindgen, lists, navitrix # strings, tables, vsetup # ############################################################################ # # Requires: Version 9 graphics # ############################################################################ # # Links: main, control, grammar, search, support # ############################################################################ link main link grammar link search link support link control