Software

Here is a collection of software we have developed that is freely available for research use. The software implements discrete algorithms in computational biology and combinatorial optimization.

Aligning alignments optimally
Sum-of-pairs multiple sequence alignment
Cardinality matchings in general graphs

Aligning alignments optimally

The software AlignAlign, developed by John Kececioglu, Dean Starrett, and Travis Wheeler, computes an optimal alignment of two multiple-sequence alignments under the sum-of-pairs scoring function with linear gap-costs. While computing such an alignment is NP-complete in general, AlignAlign is remarkably fast on biological alignments, empirically running in time quadratic in the number of columns in the input alignments.

AlignAlign is available at:

http://alignalign.cs.arizona.edu

The algorithm that AlignAlign implements is described in:

John Kececioglu and Dean Starrett.
"Aligning alignments exactly."
Proceedings of the 8th ACM Conference on Research in Computational Molecular Biology (RECOMB), 85-96, 2004.

Sum-of-pairs multiple sequence alignment

The program MSA, developed by Stephen Altschul, John Kececioglu and David Lipman, and modified by Sandeep Gupta and Alejandro Schäffer, is a tool for multiple sequence alignment under the sum-of-pairs scoring function. In principle, MSA can compute optimal sum-of-pairs alignments, though in practice it usually does not guarantee optimality of the alignment. (The branch-and-bound approach it implements takes exponential space in the worst-case, so the search is often restricted to a region that does not guarantee optimality.)

MSA is available at:

ftp://fastlink.nih.gov/pub/msa

The original paper on MSA is:

David Lipman, Stephen Altschul, and John Kececioglu.
"A tool for multiple sequence alignment."
Proceedings of the National Academy of Science USA 86, 4412-4415, 1989.

The algorithm and its implementation is described in:

Sandeep Gupta, John Kececioglu, and Alejandro Schäffer.
"Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment."
Journal of Computational Biology 2:3, 459-472, 1995.

Cardinality matchings in general graphs

A matching of a graph is a subset of the edges in which no two edges touch a common vertex. An implementation of Edmonds' algorithm for computing a maximum cardinality matching of a general graph is available here. Given a graph of n vertices and m edges, the implementation runs in O(m n a(m,n)) time, where a(m,n), an inverse of Ackermann's function, is the amortized time per operation of the disjoint set data structure.

Two implementations are available. The first contains all the heuristics described in the paper cited below (greedy initial matching, stopping test for early termination of the depth-first search, delayed shrinking of blossoms, and lazy expansion of blossoms). The second contains just two heuristics (greedy initial matching, and the stopping test), and across experiments with sixteen variants representing all possible combinations of the four heuristics, is the fastest variant. This second code is up to 4 to 350 times faster than the cardinality matchings codes from the DIMACS Implementation Challenge and LEDA, on graphs with up to 100,000 vertices and 500,000 edges.

The implementations are in the form of compressed shar files containing a makefile, all C source files, and a sample graph:

The implementations and heuristics are described in:

John Kececioglu and Justin Pecqueur.
"Computing maximum-cardinality matchings in sparse general graphs."
Proceedings of the 2nd Workshop on Algorithm Engineering, 121-132, 1998.

Research