python - 计算两个字符串的字母差异

Question

这是我想要的行为：

a: IGADKYFHARGNYDAA
c: KGADKYFHARGNYEAA
2 difference(s).

score 18 · Accepted Answer

18

def diff_letters(a,b):
    return sum ( a[i] != b[i] for i in range(len(a)) )

于 2012-09-01T10:15:38.780 回答

score 13 · Accepted Answer

I think this example will work for your specific case without too much hassle and without hitting interoperability issues with your python software version (upgrade to 2.7 please):

a='IGADKYFHARGNYDAA'
b='KGADKYFHARGNYEAA'

u=zip(a,b)
d=dict(u)

x=[]
for i,j in d.items(): 
    if i==j:
        x.append('*') 
    else: 
        x.append(j)
        
print x

Outputs: ['*', 'E', '*', '*', 'K', '*', '*', '*', '*', '*']

With a few tweaks, you can get what you want....Tell me if it helps :-)

Update

You can also use this:

a='IGADKYFHARGNYDAA'
b='KGADKYFHARGNYEAA'

u=zip(a,b)
for i,j in u:
    if i==j:
        print i,'--',j
    else: 
        print i,'  ',j

Outputs:

I    K
G -- G
A -- A
D -- D
K -- K
Y -- Y
F -- F
H -- H
A -- A
R -- R
G -- G
N -- N
Y -- Y
D    E
A -- A
A -- A

Update 2

You may modify the code like this:

y=[]
counter=0
for i,j in u:
    if i==j:
        print i,'--',j
    else: 
        y.append(j)
        print i,'  ',j
        
print '\n', y

print '\n Length = ',len(y)

Outputs:

I    K
G -- G
A -- A
D -- D
K -- K
Y -- Y
F -- F
H -- H
A -- A
R -- R
G -- G
N -- N
Y -- Y
D    E
A -- A
A    X

['K', 'E', 'X']

 Length =  3

score 12 · Accepted Answer

理论

同时迭代两个字符串并比较字符。
通过分别向其添加空格键或|字符，将结果与新字符串一起存储。此外，为每个不同的字符增加一个从零开始的整数值。
输出结果。

执行

You can use the built-in zip function or itertools.izip to simultaneously iterate over both strings, while the latter is a little more performant in case of huge input. If the strings are not of the same size, iteration will only happen for the shorter-part. If this is the case, you can fill up the rest with the no-match indicating character.

import itertools

def compare(string1, string2, no_match_c=' ', match_c='|'):
    if len(string2) < len(string1):
        string1, string2 = string2, string1
    result = ''
    n_diff = 0
    for c1, c2 in itertools.izip(string1, string2):
        if c1 == c2:
            result += match_c
        else:
            result += no_match_c
            n_diff += 1
    delta = len(string2) - len(string1)
    result += delta * no_match_c
    n_diff += delta
    return (result, n_diff)

Example

Here's a simple test, with slightly different options than from your example above. Note that I have used an underscore for indicating non-matching characters to better demonstrate how the resulting string is expanded to the size of the longer string.

def main():
    string1 = 'IGADKYFHARGNYDAA AWOOH'
    string2 = 'KGADKYFHARGNYEAA  W'
    result, n_diff = compare(string1, string2, no_match_c='_')

    print "%d difference(s)." % n_diff  
    print string1
    print result
    print string2

main()

Output:

niklas@saphire:~/Desktop$ python foo.py 
6 difference(s).
IGADKYFHARGNYDAA AWOOH
_||||||||||||_|||_|___
KGADKYFHARGNYEAA  W

score 5 · Accepted Answer

Python 具有出色的difflib，它应该提供所需的功能。

以下是文档中的示例用法：

import difflib  # Works for python >= 2.1

>>> s = difflib.SequenceMatcher(lambda x: x == " ",
...                     "private Thread currentThread;",
...                     "private volatile Thread currentThread;")
>>> for block in s.get_matching_blocks():
...     print "a[%d] and b[%d] match for %d elements" % block
a[0] and b[0] match for 8 elements
a[8] and b[17] match for 21 elements
a[29] and b[38] match for 0 elements

score 2 · Accepted Answer

a = "IGADKYFHARGNYDAA" 
b = "KGADKYFHARGNYEAAXXX"
match_pattern = zip(a, b)                                 #give list of tuples (of letters at each index)
difference = sum (1 for e in zipped if e[0] != e[1])     #count tuples with non matching elements
difference = difference + abs(len(a) - len(b))            #in case the two string are of different lenght, we add the lenght difference

score 1 · Accepted Answer

I haven't seen anyone use the reduce function, so I'll include a piece of code I've been using:

reduce(lambda x, y: x + 1 if y[0] != y[1] else x, zip(source, target), 0)

which will give you the number of differing characters in source and target

score 0 · Accepted Answer

With difflib.ndiff you can do this in a one-liner that's still somewhat comprehensible:

>>> import difflib
>>> a = 'IGADKYFHARGNYDAA'
>>> c = 'KGADKYFHARGNYEAA'
>>> sum([i[0] != ' '  for i in difflib.ndiff(a, c)]) / 2
2

(sum works here because, well, kind of True == 1 and False == 0)

The following makes it clear what's happening and why the / 2 is needed:

>>> [i for i in difflib.ndiff(a,c)]
['- I',
 '+ K',
 '  G',
 '  A',
 '  D',
 '  K',
 '  Y',
 '  F',
 '  H',
 '  A',
 '  R',
 '  G',
 '  N',
 '  Y',
 '- D',
 '+ E',
 '  A',
 '  A']

This also works well if the strings have a different length.

score 0 · Accepted Answer

When looping through one string, make a counter object that identifies the letter you are on at each iteration. Then use this counter as an index to refer to the other sequence.

a = 'IGADKYFHARGNYDAA'
b = 'KGADKYFHARGNYEAA'

counter = 0
differences = 0
for i in a:
    if i != b[counter]:
        differences += 1
    counter += 1

Here, each time we come across a letter in sequence a that differs from the letter at the same position in sequence b, we add 1 to 'differences'. We then add 1 to the counter before we move onto the next letter.

score 0 · Accepted Answer

I like the answer from Niklas R, but it has an issue (depending on your expectations). Using the answer with the following two test cases:

print compare('berry','peach')
print compare('berry','cherry')

We may reasonable expect cherry to be more similar to berry than to peach. Yet the we get a lower diff between berry and peach, then berry and cherry:

(' |   ', 4)  # berry, peach
('   |  ', 5) # berry, cherry

This occurs when strings are more similar backwards, than forwards. To extend the answer from answer from Niklas R, we can add a helper function which returns the minimum diff between the normal (forwards) diff and a diff of the reversed strings:

def fuzzy_compare(string1, string2):
    (fwd_result, fwd_diff) = compare(string1, string2)
    (rev_result, rev_diff) = compare(string1[::-1], string2[::-1])
    diff = min(fwd_diff, rev_diff)
    return diff

Use the following test cases again:

print fuzzy_compare('berry','peach')
print fuzzy_compare('berry','cherry')

...and we get

4 # berry, peach
2 # berry, cherry

As I said, this really just extends, rather than modifies the answer from Niklas R.

If you're just looking for a simple diff function (taking into consideration the aforementioned gotcha), the following will do:

def diff(a, b):
    delta = do_diff(a, b)
    delta_rev = do_diff(a[::-1], b[::-1])
    return min(delta, delta_rev)

def do_diff(a,b):
    delta = 0
    i = 0
    while i < len(a) and i < len(b):
        delta += a[i] != b[i]
        i += 1
    delta += len(a[i:]) + len(b[i:])
    return delta

Test cases:

print diff('berry','peach')
print diff('berry','cherry')

One final consideration is of the diff function itself when handling words of different lengths. There are two options:

Consider the difference between lengths as difference characters.
Ignore the difference in length and compare only shortest word.

For example:

apple and apples have a difference of 1 when considering all characters.
apple and apples have a difference of 0 when considering only the shortest word

When considering only the shortest word we can use:

def do_diff_shortest(a,b):
    delta, i = 0, 0
    if len(a) > len(b):
        a, b = b, a
    for i in range(len(a)):
        delta += a[i] != b[i]
    return delta

...the number of iterations is dictated by the shortest word, everything else is ignored. Or we can take into consideration different lengths:

def do_diff_both(a, b):
    delta, i = 0, 0
    while i < len(a) and i < len(b):
        delta += a[i] != b[i]
        i += 1
    delta += len(a[i:]) + len(b[i:])
    return delta

In this example, any remaining characters are counted and added to the diff value. To test both functions

print do_diff_shortest('apple','apples')
print do_diff_both('apple','apples')

Will output:

0 # Ignore extra characters belonging to longest word.
1 # Consider extra characters.

score 0 · Accepted Answer

Here is my solution to a similar problem comparing two strings based on the solution presented here: https://stackoverflow.com/a/12226960/3542145 .

Since itertools.izip did not work for me in Python3, I found the solution which simply uses the zip function instead: https://stackoverflow.com/a/32303142/3542145 .

The function to compare the two strings:

def compare(string1, string2, no_match_c=' ', match_c='|'):
    if len(string2) < len(string1):
        string1, string2 = string2, string1
    result = ''
    n_diff = 0
    for c1, c2 in zip(string1, string2):
        if c1 == c2:
            result += match_c
        else:
            result += no_match_c
            n_diff += 1
    delta = len(string2) - len(string1)
    result += delta * no_match_c
    n_diff += delta
    return (result, n_diff)

Setup the two strings for comparison and call the function:

def main():
    string1 = 'AAUAAA'
    string2 = 'AAUCAA'
    result, n_diff = compare(string1, string2, no_match_c='_')
    print("%d difference(s)." % n_diff)
    print(string1)
    print(result)
    print(string2)

main()

Which returns:

1 difference(s).
AAUAAA
|||_||
AAUCAA

score 0 · Accepted Answer

Here is my solution. This compares 2 strings, it doesn't matter what you put in A or B.

#Declare Variables
a='Here is my first string'
b='Here is my second string'
notTheSame=0
count=0

#Check which string is bigger and put the bigger string in C and smaller string in D
if len(a) >= len(b):
    c=a
    d=b
if len(b) > len(a):
    d=a
    c=b

#While the counter is less than the length of the longest string, compare each letter.
while count < len(c):
    if count == len(d):
        break
    if c[count] != d[count]:
        print(c[count] + " not equal to " + d[count])
        notTheSame = notTheSame + 1
    else:
        print(c[count] + " is equal to " + d[count])
    count=count+1

#the below output is a count of all the differences + the difference between the 2 strings
print("Number of Differences: " + str(len(c)-len(d)+notTheSame))

score 0 · Accepted Answer

0

diff = 0
for i, j in zip(a, b): 
    if i != j: diff += 1
print(diff)

于 2021-01-10T01:50:59.830 回答

python - 计算两个字符串的字母差异

12 回答 12

理论

执行

Example

Related