我有一些包含拼写错误的数据。我正在纠正它们并使用以下代码对拼写的接近程度进行评分:
import pandas as pd
import difflib
Li_A = ["potato", "tomato", "squash", "apple", "pear"]
B = {'one' : pd.Series(["potat0", "toma3o", "s5uash", "ap8le", "pea7"], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series(["po1ato", "2omato", "squ0sh", "2pple", "p3ar"], index=['a', 'b', 'c', 'd', 'e'])}
df_B = pd.DataFrame(B)
# Define the function that corrects the spelling:
def Spelling(ask):
return difflib.get_close_matches(ask, Li_A, n=3, cutoff=0.5)[0]
df_B['Correct one'] = df_B['one'].apply(Spelling)
# Define the function that Scores the spelling:
def Spell_Score(row):
return difflib.SequenceMatcher(None, row['one'], row['Correct one']).ratio()
df_B['Score'] = df_B.apply(Spell_Score, axis=1)
这会输出正确的拼写和分数:
df_B
one two Correct one Score
a potat0 po1ato potato 0.833333
b toma3o 2omato tomato 0.833333
c s5uash squ0sh squash 0.833333
d ap8le 2pple apple 0.800000
e pea7 p3ar pear 0.750000
请问如何添加列以给出第二和第三高得分结果及其分数?