Here is a collection of software we have developed that is freely available for research use. The software implements discrete algorithms in computational biology and combinatorial optimization.

- Aligning alignments optimally
- Sum-of-pairs multiple sequence alignment
- Cardinality matchings in general graphs

The software
`AlignAlign`,
developed by John Kececioglu, Dean Starrett, and Travis Wheeler,
computes an optimal alignment of two multiple-sequence alignments
under the sum-of-pairs scoring function with linear gap-costs.
While computing such an alignment is NP-complete in general,
`AlignAlign` is remarkably fast on
biological alignments, empirically running in time quadratic in the number
of columns in the input alignments.

`AlignAlign` is available at:

http://alignalign.cs.arizona.edu

The algorithm that `AlignAlign` implements is described in:

John Kececioglu and Dean Starrett.

"Aligning alignments exactly."

Proceedings of the 8th ACMConference on Research in Computational Molecular Biology(RECOMB), 85-96, 2004.

The program
`MSA`,
developed by Stephen Altschul, John Kececioglu
and David Lipman, and modified by Sandeep Gupta and Alejandro Schäffer,
is a tool for
multiple sequence alignment under the sum-of-pairs scoring function.
In principle, `MSA` can compute optimal sum-of-pairs alignments,
though in practice it usually does not guarantee optimality of
the alignment.
(The branch-and-bound approach it implements takes exponential space
in the worst-case, so the search is often restricted to a region that
does not guarantee optimality.)

`MSA` is available at:

ftp://fastlink.nih.gov/pub/msa

The original paper on `MSA` is:

David Lipman, Stephen Altschul, and John Kececioglu.The algorithm and its implementation is described in:

"A tool for multiple sequence alignment."

Proceedings of the National Academy of Science USA86, 4412-4415, 1989.

Sandeep Gupta, John Kececioglu, and Alejandro Schäffer.

"Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment."

Journal of Computational Biology2:3, 459-472, 1995.

A *matching* of a graph is a subset of the edges
in which no two edges touch a common vertex.
An implementation of Edmonds' algorithm
for computing a maximum cardinality matching of a general graph
is available here.
Given a graph of *n* vertices and *m* edges,
the implementation runs in *O(m n a(m,n))*
time, where *a(m,n)*, an inverse of Ackermann's function,
is the amortized time per operation of the disjoint set data structure.

Two implementations are available. The first contains all the heuristics described in the paper cited below (greedy initial matching, stopping test for early termination of the depth-first search, delayed shrinking of blossoms, and lazy expansion of blossoms). The second contains just two heuristics (greedy initial matching, and the stopping test), and across experiments with sixteen variants representing all possible combinations of the four heuristics, is the fastest variant. This second code is up to 4 to 350 times faster than the cardinality matchings codes from the DIMACS Implementation Challenge and LEDA, on graphs with up to 100,000 vertices and 500,000 edges.

The implementations are in the form of compressed `shar` files
containing a makefile, all C source files, and a sample graph:

The implementations and heuristics are described in:

John Kececioglu and Justin Pecqueur.

"Computing maximum-cardinality matchings in sparse general graphs."

Proceedings of the 2ndWorkshop on Algorithm Engineering, 121-132, 1998.

Research