link senten1
August 14, 1996; Peter A. Bigot
This file is in the public domain.
sentence(f) generates the English sentences encountered in a file. ____________________________________________________________ The following rules describe what a 'sentence' is. * A sentence begins with a capital letter. * A sentence ends with one or more of '.!?', subject to other constraints. * If a period is immediately followed by: - a digit - a letter - one of ',;:' it is not a sentence end. * If a period is followed (with intervening space) by a lower case letter, it is not a sentence end (assume it's part of an abbreviation). * The sequence '...' does not end a sentence. The sequence '....' does. * If a sentence end character appears after more opening parens than closing parens in a given sequence, it is not the end of that particular sentence. (I.e., full sentences in a parenthetical remark in an enclosing sentence are considered part of the enclosing sentence. Their grammaticality is in question, anyway.) (It also helps with attributions and abbreviations that would fail outside the parens.) * No attempt is made to ensure balancing of double-quoted (") material. * When scanning for a sentence start, material which does not conform is discarded. * Corollary: Quotes or parentheses which enclose a sentence are not considered part of it. * An end-of-line on input is replaced by a space unless the last character of the line is 'a-' (where 'a' is any letter), in which case the hyphen is deleted. * Leading and trailing space (tab, space, newline) chars are removed from each line of the input. * If a blank line is encountered on input while scanning a sentence, the scan is aborted and search for a new sentence begins (rationale: ignore section and chapter headers separated from text by newlines). * Most titles before names would fail the above constraints. They are special-cased. * This does NOT handle when a person uses their middle initial. To do so would rule out sentences such as 'It was I.', Six of one, half-dozen of the other--I made my choice. * Note that ':' does not end a sentence. This is a stylistic choice, and can be modified by simply adding ':' to sentend below.