Events & News
Colloquium
Category | Lecture |
Date | Thursday, April 30, 2009 |
Time | 9:00 am |
Location | GS 942 |
Details | Adviser: Kobus Barnard |
Speaker | David Perkins |
Title | MS Thesis Defense |
Affiliation | Computer Science Department |
Predicting secondary structure of proteins by linear and dynamic programming
Proteins are sequences of amino acids that fold into secondary and tertiary structure, which plays an important role in their function. As biologists have yet to discover the rules that govern how a protein folds in nature from its underlying sequence, this thesis tries a new approach to secondary structure prediction using dynamic programming on the input protein sequence. The sequence is broken into short words, where each word has a probability of folding into the three different types of secondary structure. By combining word probabilities with an abstraction called contexts, which model a run of the same secondary structure type up to a bounded length, the optimal prediction for an entire sequence can be computed via dynamic programming. The structure probabilities for words are learned from a training set of sequences with known secondary structure using linear programming. The combined approach to prediction using linear and dynamic programming achieves high accuracy on protein sequences whose words were observed in the training set, but is far less accurate on sequences with unobserved words not seen in the training set. The challenge for future work lies in interpolating probabilities for unobserved words to achieve improved generalization.