7. Prescreeners:A Unified Framework for Clipping, Screening, and TaggingThe pipeline must always consist of an initial Input stage that imports fragment records from the file system, an Overlap stage which computes all overlaps between fragment sequences, and a final Assembly stage that melds the fragments into a reconstruction of the target. Between the Input and Overlap stages may be any number and combination of Prescreener stages. In the design of FAKtory, we chose to develop a single, unifying framework in which one could formulate a wide range of criteria and recipes for clipping, screening, and tagging fragment sequences. This framework involves a set of five types of pattern recognizers and a small expression language for flexibly combining the results of these recognizers. We start by describing such a general-purpose Prescreener stage whose configuration panel presents the user with the full power of the framework. A Prescreener stage consists of several prescreeners, each of which can be programmed to either cut off a 5'- or 3'-end of a fragment's sequence, or to tag substrings of the sequence with a specifiable color and symbolic name. The interval(s) of a fragment's sequence which will be clipped or tagged by a presceener are specified by an interval expression which is basically a pattern that matches a set of disjoint substrings, specified as intervals of character positions. We will describe interval expressions and the intervals they match in a bottom up fashion by starting with the simplest:
This simple interval expression language is sufficient to describe quite complex clipping or tagging criteria. For example, if one wanted to clip at the clone insertion restriction site, or at the 50th base if such a site cannot be found because of poor signal quality in the initial part of the read, then one can express this with the interval expression [ 0 , Site(Intv) : Intv ] where Intv is the interval recognizer [0,50], and Site is a regular expression recognizer for the cut site with say 1 mismatch allowed and optioned to return the 5'-most instance. In designing the general Prescreener-type stages above, we again came up against the problem of the desire for generality resulting in a mechanism that required significant skill to utilize. Often, however, the full power and concomitant complexity of the full framework is not needed. To alleviate this problem, we set about designing simpler, specialized interfaces called Clip, Screen, and Tag stages that are sub-classes of prescreeners directly suited to expressing common clipping, vector screening, and element tagging functions. We give a quick overview of each of these special panels:
We find that in practice these simple sub-classes suffice to express most of the preprocessing needed on fragment sequences before computing overlaps between them and then assembling them into contigs. Only on occasion is the full power of interval expressions required. As a final note, every clip, tag, and vector panel can be viewed as a general prescreener if desired. The prescreeners therein may then be modified using the more powerful console of the Prescreener panel. One may always flip back to the original subclass, providing the specification has not changed. This permits users to learn about the Prescreener by seeing how Clip, Vector, and Tag specifications are codified as Prescreener specifications. |