python - Pandas Dataframe 中行的 Python FuzzyWuzzy 分数

Question

我想遍历 Pandas 数据框并仅获取每行对的 fuzz.ratio 分数（并非所有组合）。我的数据框如下所示：

Acct_Owner ,地址,地址 2

0, Name1, NaN, 33 Liberty Street
1, Name2, 330 N Wabash Ave Ste 39300, 330 North Wabash Avenue Suite 39300

有缺失值，所以我使用“try:”跳过缺失值行。以下是当前的 for 循环：

for row in df_high_scores.index:
    k1 = df_high_scores.get_value(row, 'Address')
    k2 = df_high_scores.get_value(row, 'Address2')

    try:
        df_high_scores['Address_Score'] = fuzz.ratio(k1, k2)
    except:
        None

结果显示所有行的分数相同。 希望弄清楚为什么循环没有遍历并对每一行进行评分。 谢谢阅读...

score 1 · Accepted Answer

分配需要使用正确的行和索引。

df_high_scores.loc[row, 'Address_Score'] = fuzz.ratio(k1, k2)

执行此操作而不是迭代行的更好方法是：

df_high_scores['Address_Score'] = df_high_scores.apply(lambda x : fuzz.ratio(x.Address, x.Address2), axis=1)

apply 对于大型数组实际上很慢。查找模糊以查看是否可以将 numpy 数组或 pandas Series 作为输入传递。

python - Pandas Dataframe 中行的 Python FuzzyWuzzy 分数

1 回答 1

Related

Reference