python - fuzzywuzzy ratio of 2 columns if one column satisfies 100 percent match the best one

Question

My data frame is

Matcher = df2['Account Name']

match = if df1['Billing Country'] == df2['Billing Country'] (process.extractOne(df1['Account Name'], Matcher))

The above code is not working but I want to do the fuzzy match of account name only when the country is matching.

score 1 · Accepted Answer

Here's what I am suggesting. First, a full cartesian join on the two dfs:

df1.loc[:, 'MergeKey'] = 1 #create a mergekey
df2.loc[:, 'MergeKey'] = 1 #it is the same for both so that when you merge you get the cartesian product
#merge them to get the cartesian product (all possible combos)
merged = df1.merge(df2, on = 'MergeKey', suffixes = ['_1', '_2'])

Then, calculate the fuzz ratio for each combo:

def fuzzratio(row):
    try: #avoid errors for example on NaN's
        return fuzz.ratio(row['Billing Country_1'], row['Billing Country_2'])
    except:
        return 0. #you'll want to expiriment w/o the try/except too
merged.loc[:, 'Ratio'] = merged.apply(fuzzratio, axis = 1) #create ratio column by applying function

Now you should have a df with the ratio between all possible combinations of df1['Billing Country'] and df2['Billing Country']. Once there, simply filter to get the ones where the ratio is 100%:

result = merged[merged.Ratio ==1]

score 0 · Accepted Answer

I figured it out in slightly different way.

first I merged using

merged_file = pd.merge(df2, df1, on='Billing Country', how = 'left')

and when I had all the possible matches.

I applies fuzzywuzzy's

`Reference_data= df2['Account Name']`

`Result = process.extractOne(df1, choices)`

As the above string gave me the closest possible match for each value I wanted to lookup for. Later I added one more string in order to calculate the ratio.

Result['ratio']= fuzz.ratio(Result['Account Name_x'],Result['Account Name_y'] )

python - fuzzywuzzy ratio of 2 columns if one column satisfies 100 percent match the best one

2 回答 2

Related

Reference