The University of Arizona

Events & News

Colloquium

CategoryLecture
DateTuesday, August 21, 2012
Time11:00 am
Concludes12:00 pm
LocationGould-Simpson 906
SpeakerTitus Brown
TitleAssistant Professor
AffiliationMichigan State University

Streaming Glossy Compression of Biological Sequence Data Using Probabilistic Data Structures

In recent years, next-generation DNA sequencing capacity has completely outstripped our ability to computationally digest the resulting volume of data. Driven by the need to actually analyze the data, our lab has developed a suite of novel data structures and algorithms for graph compression and data reduction; in addition to being darned efficient on their own, our approaches make use of probabilistic data structures that enable substantially lower memory usage than the best possible exact approach. Using these approaches we have been able to scale de novo data assembly approaches down to cloud computing infrastructure, and we have also completed some of the largest de novo assemblies of metagenomes ever done. Last but not least, these approaches show the way to essentially infinite de novo assembly of environmental microbial data.

Biography

Trained by physicists, with a BA in pure math, a PhD in molecular developmental biology, and lots of open source code to my name, I am currently a biologist trapped in a computer science department. I work at the intersection of big sequence data, novel computer science data structures and algorithms, and biological hypothesis generation & validation.

Blog: http://ivory.idyll.org/blog/
Lab web site: http://ged.msu.edu/.