senten1.icn: Procedure to generate sentences

link senten1
August 14, 1996; Peter A. Bigot
This file is in the public domain.

sentence(f) generates the English sentences encountered in a file.
____________________________________________________________

The following rules describe what a 'sentence' is.

* A sentence begins with a capital letter.

* A sentence ends with one or more of '.!?', subject to other
  constraints.

* If a period is immediately followed by:
  - a digit
  - a letter
  - one of ',;:'
  it is not a sentence end.

* If a period is followed (with intervening space) by a lower case
  letter, it is not a sentence end (assume it's part of an abbreviation).

* The sequence '...' does not end a sentence.  The sequence '....' does.

* If a sentence end character appears after more opening parens than
  closing parens in a given sequence, it is not the end of that
  particular sentence. (I.e., full sentences in a parenthetical remark
  in an enclosing sentence are considered part of the enclosing
  sentence.  Their grammaticality is in question, anyway.) (It also
  helps with attributions and abbreviations that would fail outside
  the parens.)

* No attempt is made to ensure balancing of double-quoted (") material.

* When scanning for a sentence start, material which does not conform is
  discarded.

* Corollary: Quotes or parentheses which enclose a sentence are not
  considered part of it.

* An end-of-line on input is replaced by a space unless the last
  character of the line is 'a-' (where 'a' is any letter), in which case
  the hyphen is deleted.

* Leading and trailing space (tab, space, newline) chars are removed
  from each line of the input.

* If a blank line is encountered on input while scanning a sentence,
  the scan is aborted and search for a new sentence begins (rationale:
  ignore section and chapter headers separated from text by newlines).

* Most titles before names would fail the above constraints.  They are
  special-cased.

* This does NOT handle when a person uses their middle initial.  To do
  so would rule out sentences such as 'It was I.',  Six of one, half-dozen
  of the other--I made my choice.

* Note that ':' does not end a sentence.  This is a stylistic choice,
  and can be modified by simply adding ':' to sentend below.

Source code | Program Library Page | Icon Home Page