2

我有一个必须与数据框列匹配的字符串列表。

该列表如下所示:

list = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view
wcdma']  

数据框中的列如下所示:

data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}

我想从列表中找到包含每个单词的每一行,这样我就可以拥有下一个数据框:

  COLUMN                               |  String    
 wcdma street view disconnected        | street view wcdma  
 gbts planned work street view         | street view gbts  
 lte atn golden village optical invalid| golden village lte  
 wcdma street view planned work        | street view wcdma   

我试图找到匹配项是在列表中提供字符串作为元素列表(如 ['street'、'view'、'wcdma'])并进行搜索:

df.apply(lambda x: all(er in x.COLUMN for er in list), axis=1)

但它什么也没给我,即使我知道必须至少有一场比赛。如果我将 all() 更改为 any() 它将返回 smth 但这不是我需要的。

4

2 回答 2

1

你可以试试这个。

df = pd.DataFrame({'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']})
df

                                   COLUMN
0          wcdma street view disconnected
1           gbts planned work street view
2  lte atn golden village optical invalid
3          wcdma street view planned work

现在,使用df.apply

lst = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view wcdma']  
df['String'] = df.COLUMN.apply(lambda x:[i for i in lst if all(j in x for j in i.split())].pop())
df
                                   COLUMN              String
0          wcdma street view disconnected   street view wcdma
1           gbts planned work street view    street view gbts
2  lte atn golden village optical invalid  golden village lte
3          wcdma street view planned work   street view wcdma
于 2020-05-12T16:57:10.180 回答
1
import pandas as pd
list2 = ['golden village lte', 'pones wcdma', 'coral gbts', 'street view gbts', 'street view wcdma']
list2=[x.split(' ') for x in list1]
data = {'COLUMN': ['wcdma street view disconnected', 'gbts planned work street view', 'lte atn golden village optical invalid', 'wcdma street view planned work']}
data=pd.DataFrame(data)
def search(x):
    list1=x.split(' ')
    for y in list2:
         check=all(item in list1 for item in y)
         if check:
             return ' '.join(y)
    return None
data['matched']=data['COLUMN'].transform(search)

说明:我将每个字符串转换为空间上的第一个列表拆分。对'COLUMN'使用transform(),我使用all()来检测'y'的所有元素是否都在'list2'中。如果是,我返回那个字符串

于 2020-05-12T16:58:38.700 回答