Speaker: |
S.Cenk Sahinalp Case Western Reserve University Cleveland, Ohio |
---|---|
Topic: | Sequence Nearest Neighbors and Block Edit Operations |
Date: | Tuesday, December 5, 2000 |
Time: | 11:00 AM |
Place: | Gould-Simpson, Room 701 |
Let D be a database of n sequences; we would like to preprocess D so that given any on-line query sequence Q, we can quickly find a sequence S in D for which d(S,Q) < d(S,T) for any other sequence T in D. Here d(S,Q) denotes the "block edit distance" between sequences S and Q, which is defined to be the minimum number of "block edit" operations needed to transform S to T - all edit operations will be reversible so that d(S,T) = d(T,S). The block edit operations correspond to the notion of similarity between sequences in intended application and include character edits (inserts, replacements, deletes etc) as well as block edits (moves, copies, deletes, reversals) and numerical transformations (scaling by an additive or a multiplicative constant). We present the first known efficient algorithm for "approximate" nearest neighbor search for sequences under any set of edit operations. Our preprocessing time and space is polynomial in size of D and query time is near-linear in size of Q; the approximation factor we achieve is O(log k log*^2 k), where k is the size of the longest sequence in D.
The talk will summarize two papers. The first one was co-authored by G. Cormode, M. Paterson, U. Vishkin and appeared in SODA'00. The second one was coauthored by S. Muthukrishnan and appeared in STOC'00.