>NC_012532.1 Zika virus, complete genome
AGTTGTTGATCTGTGTGAGTCAGACTGCGACAGTTCGAGTCTGAAGCGAGAGCTAACAACAGTATCAACA
GGTTTAATTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCCAAAGAAGAAATCCGGAGGATCC
...
AGACTCCATGAGTTTCCACCACGCTGGCCGCCAGGCACAGATCGCCGAACTTCGGCGGCCGGTGTGGGGA
AATCCATGGTTTCT
Here the method is_leaf() returns True if and only if the tree node it is invoked on is a leaf node; and id(), left(), and right() return, respectively, the ID of the node and the left and right subtrees of the tree node. Some examples are shown here. Comment: If you want, you can use the code described here to print out your tree in a more visually appealing manner. However, this code is not part of the problem specification and is provided purely for your entertainment. If you use this code, make sure you turn off the pretty-printing before you submit your code.def __str__(self): if self.is_leaf(): return self.id() else: return "({}, {})".format(str(self.left()), str(self.right()))
where |S| denotes the size of the set S. Note that different values of N for computing N-grams can produce different distance values for the same sequences, and thereby give rise to different phylogenetic trees. For the purposes of this assignment, you should implement your N-grams as Python sets. The reason is to simplify the computation of unions and intersections:
similarity(A, B) = | ngrams(A) ⋂ ngrams(B) | | ngrams(A) ⋃ ngrams(B) |
from genome import *
from tree import *