-
-
Notifications
You must be signed in to change notification settings - Fork 46.6k
/
Copy pathedit_distance.py
34 lines (30 loc) · 1.26 KB
/
edit_distance.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def edit_distance(source, target):
"""
Edit distance algorithm is a string metric, i.e., it is a way of quantifying
how dissimilar two strings are to one another, that is measured by
counting the minimum number of operations required to transform one string
into another.
In genetic algorithms consisting of A,T, G, and C ncleotides, this matching
becomes essential in understanding the mutation in succesive genes.
Hence, this algorithm comes in handy when we are trying to quantify the
mutations in successive generations.
Args:
source (string): This is the source string, the initial string with
respect to which we are calculating the edit_distance for the target
target (string): This is the target string, which is formed after n
number of operations performed on the source string.
Assumptions:
The cost of operations (insertion, deletion and subtraction) is all 1
"""
delta = {True: 0, False: 1} # Substitution
if len(source) == 0:
return len(target)
elif len(target) == 0:
return len(source)
return min(
edit_distance(source[:-1], target[:-1]) + delta[source[-1] == target[-1]],
edit_distance(source, target[:-1]) + 1,
edit_distance(source[:-1], target) + 1,
)
print(edit_distance("ATCGCTG", "TAGCTAA"))
# Answer is 4