Colloquium Speaker

Speaker:Ming-Yang Kao
Department of Computer Science
Yale University
Topic: Fast and Accurate Reconstruction of Evolutionary Trees: a Model-Based Study
Date:Tuesday, March 21, 2000
Time:11:00 AM
Place:Gould-Simpson, Room 701


Refreshments will be served in the 7th-floor lobby of Gould-Simpson at 10:45 AM


ABSTRACT


Evolutionary trees are useful for modeling the evolutionary history of species or genes. An ultimate goal of biology is to reconstruct the evolutionary history of all known species. The determination of the evolutionary relationship of all green plants alone will involve hundreds of millions of species. Tree reconstructions of such scales require multidisciplinary collaboration as well as enormous computational resources. In this talk, we will briefly describe some ongoing projects at Yale on developing necessary computational technologies, including algorithms and systems. The technical discussion will focus on our recent discovery of an algorithm for reconstructing evolutionary trees from biomolecular sequences. In the Jukes-Cantor model of evolution generalized for an arbitrary alphabet, the algorithm is mathematically proven to use sample sequences of length only polynomial in n to recover the correct tree topology with high probability, where n is the number of species involved. This accuracy guarantee is supported by preliminary simulated experiments on the algorithm and its variants. Given the pairwise distances between n input sequences, the algorithm runs optimally in O(n^2) time using O(n) additional space in the worst case. The algorithm also works for more general models of evolution and is readily parallelizable. Previously, the most popular distance-based method, namely, Neighbor Joining, requires O(n^3) time. The only other known polynomial-time algorithm with the same accuracy guarantee takes more than O(n^4) time. The information-theoretically optimal maximum likelihood algorithm appears to need exponential time.