我有 2 个数据框,一个包含我需要分类的一列字符串(df = 数据),另一个包含可能的类别和搜索词(df = 类别)。我想在“数据”数据框中添加一列,它会根据搜索词返回一个类别。例如:
数据:
**RepairName**
A/C is not cold
flat tyre is c
the tyre needs a repair on left side
the aircon is not cold
类别:
**Category** **SearchTerm**
A/C aircon
A/C A/C
Tyre repair
Tyre flat
期望的结果数据:
**RepairName** **Category**
A/C is not cold A/C
flat tyre is c Tyre
the tyre needs a repair on left side Tyre
the aircon is not cold A/C
我已经用 apply 尝试了以下 lambda 函数。我不确定我的列引用是否在正确的位置:
data['Category'] = data['RepairName'].apply(lambda x: categories['Category'] if categories['SearchTerm'] in x else "")
data['Category'] = [categories['Category'] if categories['SearchTerm'] in data['RepairName'] else 0]
但我不断收到错误消息:
TypeError: 'in <string>' requires string as left operand, not Series
这提供了基于 SearchTerm 的类别是否存在的真/假,但是我无法返回与搜索词关联的类别:
data['containName']=data['RepairName'].str.contains('|'.join(categories['SearchTerm']),case=False)
这两者有时都有效,但并非一直有效(也许是因为我的某些搜索词不止一个词?)
data['Category'] = [
next((c for c, k in categories.values if k in s), None) for s in data['RepairName']]
d = dict(zip(categories['SearchTerm'], categories['Category']))
data['CategoryCheck'] = [next((d[y] for y in x.split() if y in d), None) for x in data['RepairName']]