############################################################################
#
#	File:     charpatt.icn
#
#	Subject:  Program for manipulating character patterns
#
#	Author:   Ralph E. Griswold
#
#	Date:     September 20, 1998
#
############################################################################
#
#  This file is in the public domain.
#
############################################################################
#
#  IMPORTANT NOTICE:  This program will be superceded by another that
#  treats pattern forms in a more general manner.  The new notation is
#  not compatible with the one used here.  There are no plans for
#  updating this program; all improvements will be made in the new
#  one.
#
#  This program allows the user to convert strings into pattern forms and
#  grammars.  Several pattern forms are supported:
#
#	[s,i]		s repeated (concatenated) i times
#	<s>		reversal of s
#	{s1,s2,...}	collation of s1, s2, ...
#
#  There are several ways to find pattern forms.  Constant strings also
#  can be specified in several ways.
#
#  Two text lists are provided:  a workspace in which the string assoicated
#  with a variable displayed, and a grammar list.
#
#  For more information, see Icon Analyst 50.
#
#  Note:  This program requires UNIX because navitrix does.  This
#  restriction could be lifted by extending navitrix to other platforms
#  or by bypassing navitrix and using a simple open dialog.
#  
############################################################################
#
#  This program is still under development.  The pattern-matching portions
#  are crude and do not yet offer generality.  Portions that are strictly ad
#  hoc are marked as such.
#
############################################################################
#
#  Things to do:
#
#  	Fix known bugs:
#
#  		Savings listed for constants sometimes are too low.
#
#		Either predicted or actual savings for n-grams are
#		not always correct.
#
#		@1 should order dialog by token symbol, not value.
#
#		On @A, symbols are exhausted and watch cursor is left
#		set. (?)
#
#		If saved grammar is reloaded, no check is made of
#		symbols in use, which may cause tokens to be
#		erroneously used as variables.  :-<
#
#		There may be some range specifications that are not
#		handled correctly.
#
#	High-priority items:
#
#		Handle blank as lst line in @C as end of file.
#
#		Have @R compute necessary width for result dialog.  Possibly
#		others, too.
#
#		Implement search interrupts at all appropriate places
#		and give option to stop, continue, or abort.  Also
#		provide specific event for interrupt.
#
#		Increase width of text-entry fields in dialog boxes.
#
#		Invert token_tbl.
#
#		Interpret blank attempts fields in dialogs as unlimited.
#
#	Lower-priority items:
#
#		Add histogram of definition lengths to grammar info. box.
#
#		For options, allow preselection of toggles to be "all",
#		"first", and, independently, "no split forms".
#
#		On @C, the goal symbol in the data *must* be replaced; else
#		there will be a generation loop.
#
#		Consider disabling reversals -- at least as an option.
#
#		Support compact grammars, at least as an option.
#
#		Provide a way for user to add reserved symbols.  (Needs to
#		be saved, if so ...).
#
#		@Y should show the tokens used, even if there are no definitions
#		for them.
#
#		Consider filtering n-grams so that, say, they must contain
#		specified characters.  Note this needs to be done in the
#		*search*.
#
#		Clean up @H dialog; get rid of unneeded fields and implement
#		those that aren't.
#
#		@G should allow cancellation before it messes things up.
#
#		Change from remove_symbol(c) to remove_symbols(s).
#
#		min > max should get meaningful notice dialog.
#
#		Add items to Options menu:
#
#			reset dialog defaults
#			mode of operation:  seek structure or compactness
#       		control of reversals
#			handling of meta-characters in pattern matches (split
#			forms)
#			add symbols (high and control characters)
#			regular vs. compact grammars (affects fom computations)
#
#		Provide grammar depth limit.
#
#		Make size the real size for the file (?)
#
#		When making multiple definitions, make sure all do something.
#		(?)
#
#		Export workspace as (1) string and (2) character pattern (i.e.,
#		with pattern forms).
#
#		If characters have be tokenized in @C, prompt to save grammar
#		before quitting.
#
#		Token actions need work. (?)
#
#		Need to put counts back if not already done.
#
#		Consider implementing n-gram search within charpatt.
#
#	  	Should have range field for locate().
#  
#  		Get rid of reverse search and put option on find search.  Also
#  		put option on location search.
#  
#  		Modify location search to give list of all positions and
#		deltas -- or at least to precompute them.  Show number of
#		occurrences before producing them.  Maybe histogram.
#  
# 	 	Need better handling of current file name.
#  
#	  	Provide way of handling "high" characters on input.
#  
#  		Fix ad-hoc pattern-matching code.
#  
#  		Fix pattern expansion.  Figure out what to do with the result.
#  
#	  	Hone bounds on searches.
#  
#  		Refine "floating string window" in current search code and use
#		it uniformly.
#  
#  		Collapse duplicate code.
#  
#  		Fix n-gram (?) bug that can lead to a vacuous production, as in
#  
#  			J->Ad
#  
#  	   	and then
#  
#  			J->K
#  			K->Ad
#  
#  		Put expand() in charpatt.icn and change to handle <A>.
#
#		Be able to look for an entire definition. (?)
#
#		Allow workspace to be resized (downward only).  May only
#		be possible to shorten line in exisiting space.
#
#	Documentation:
#
#	  	Note that in palindroid AB<A>, if A = <A> (that is, A is
#  		a palindrome), it comes out as ABA.  This cannot be avoided;
#  		it's a feature.
#
#	Concepts:
#
#		Have pattern forms in some dialogs expanded to produce string
#		to search for.
#
#
#		Explore concept of subgrammars.
#
#		Consider transformations on character strings, editing.
#  
#  See also the comments in the program file, especially those in uppercase
#  throughout.
#
############################################################################
#
#  A grammar is represented in a file as a pair of L-Systems, which can be
#  processed by lindsys.icn.  The first L-system is for the grammar
#  proper.  The second is a grammar for the tokens, if any.  The first
#  grammar has the same name as the file it is saved in.  The second
#  grammar has that name with ".tok" appended.
#
#  Note:  Lines can be arbitrarily long.
#
############################################################################
#
#  Requires:  Version 9 graphics, UNIX
#
############################################################################
#
#  Links:  expander, interact, io, lindgen, lists, navitrix
#	   strings, tables, vsetup
#
############################################################################

#
#  Requires:  Version 9 graphics
#
############################################################################
#
#  Links:  main, control, grammar, search, support
#
############################################################################

link main
link grammar
link search
link support
link control