我有三个数据集(final_NN
, ppt_code
, herd_id
),我想MapValue
在final_NN
dataframe中添加一个新的列,并且可以从其他两个dataframe中检索要添加的值,规则在代码之后的底部。
import pandas as pd
final_NN = pd.DataFrame({
"number": [123, 456, "Unknown", "Unknown", "Unknown", "Unknown", "Unknown", "Unknown", "Unknown", "Unknown"],
"ID": ["", "", "", "", "", "", "", "", 799, 813],
"code": ["", "", "AA", "AA", "BB", "BB", "BB", "CC", "", ""]
})
ppt_code = pd.DataFrame({
"code": ["AA", "AA", "BB", "BB", "CC"],
"number": [11, 11, 22, 22, 33]
})
herd_id = pd.DataFrame({
"ID": [799, 813],
"number": [678, 789]
})
new_column = pd.Series([])
for i in range(len(final_NN)):
if final_NN["number"][i] != "" and final_NN["number"][i] != "Unknown":
new_column[i] = final_NN['number'][i]
elif final_NN["code"][i] != "":
for p in range(len(ppt_code)):
if ppt_code["code"][p] == final_NN["code"][i]:
new_column[i] = ppt_code["number"][p]
elif final_NN["ID"][i] != "":
for h in range(len(herd_id)):
if herd_id["ID"][h] == final_NN["ID"][i]:
new_column[i] = herd_id["number"][h]
else:
new_column[i] = ""
final_NN.insert(3, "MapValue", new_column)
print(final_NN)
final_NN:
number ID code
0 123
1 456
2 Unknown AA
3 Unknown AA
4 Unknown BB
5 Unknown BB
6 Unknown BB
7 Unknown CC
8 Unknown 799
9 Unknown 813
ppt_code:
code number
0 AA 11
1 AA 11
2 BB 22
3 BB 22
4 CC 33
herd_id:
ID number
0 799 678
1 813 789
预期输出:
number ID code MapValue
0 123 123
1 456 456
2 Unknown AA 11
3 Unknown AA 11
4 Unknown BB 22
5 Unknown BB 22
6 Unknown BB 22
7 Unknown CC 33
8 Unknown 799 678
9 Unknown 813 789
规则是:
- 如果
number
in final_NN 不是Unknown
,MapValue
=number
infinal_NN
; - 如果
number
in final_NN isUnknown
butcode
infinal_NN
is not Null,则搜索ppt_code数据框,并使用code
和其对应的“数字”映射并填写“MapValue”中final_NN
; - 如果两者
number
和code
infinal_NN
分别是Unknown
和null,但ID
infinal_NN
不是Null,则搜索herd_id
dataframe,并使用ID
和其对应number
的填充MapValue
第一个dataframe。如上所述,我通过数据框应用了一个循环,这是实现此目的的缓慢方法。但我知道可能有更快的方法来做到这一点。只是想知道有人会帮助我有一个快速和简单的方法来达到同样的结果吗?