CSc 120: Sequence-set Similarity
Expected Behavior
Write a Python function
seq_set_sim(seq_set1, seq_set2, k) that takes as arguments two
sets of strings seq_set1 and seq_set2 and an
integer k, and
returns a floating point value between 0 and 1
(inclusive) giving the similarity between the sets of strings
seq_set1 and seq_set2. Compute the similarity
value as follows:
-
Use the Jaccard index to compute the similarity between individual
strings.
-
Compute the distance between the sets of strings
seq_set1 and seq_set2 as the
maximum similarity between any string in
seq_set1 and any string in seq_set2.
You can use the code from the previous short problems as helper functions
for this problem.
You can assume that seq_set1 and seq_set2
are both non-empty and that the strings in these sets
all have length at least k.
Examples
-
Call: seq_set_sim(set(['aaaa','aabb']), set(['aaab']), 3)
Return value: 0.5
-
Call: seq_set_sim(set(['aaabba','aabbcc']), set(['aaab','abbc']), 4)
Return value: 0.3333333333333333
-
Call: seq_set_sim(set(['aaabba','abbc']), set(['aaab','aabbcc']), 2)
Return value: 0.6
-
Call: seq_set_sim(set(['ababab','acacac']),set(['bababa','cacaca']), 3)
Return value: 1.0
-
Call: seq_set_sim(set(['abbbbba','bcccccb']), set(['aaaaab','aaaaac']), 3)
Return value: 0.0