>NC_012532.1 Zika virus, complete genome
AGTTGTTGATCTGTGTGAGTCAGACTGCGACAGTTCGAGTCTGAAGCGAGAGCTAACAACAGTATCAACA
GGTTTAATTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCCAAAGAAGAAATCCGGAGGATCC
...
AGACTCCATGAGTTTCCACCACGCTGGCCGCCAGGCACAGATCGCCGAACTTCGGCGGCCGGTGTGGGGA
AATCCATGGTTTCT
def __str__(self):
if self.is_leaf():
return self.id()
else:
return "({}, {})".format(str(self.left()), str(self.right()))
Here the method is_leaf() returns True if and only
if the tree node it is invoked on is a leaf node; and id(),
left(),
and right() return, respectively, the ID of the node
and the left and right subtrees of the tree node.
Some examples are shown
here.
Comment: If you want, you can use the code described
here to
print out your tree in a more visually appealing manner. However,
this code is not part of the problem specification and is
provided purely for your entertainment.
If you use this code, make sure you turn
off the pretty-printing before you submit your code.
where |S| denotes the size of the set S. Note that different values of N for computing N-grams can produce different distance values for the same sequences, and thereby give rise to different phylogenetic trees. For the purposes of this assignment, you should implement your N-grams as Python sets. The reason is to simplify the computation of unions and intersections:
similarity(A, B) = | ngrams(A) ⋂ ngrams(B) | | ngrams(A) ⋃ ngrams(B) |
from genome import *
from tree import *