python - 我想将两个 DataFrame 与其列字符串进行比较。Str 比较应该使用 split() 以不同的方式完成，包括所有规范字符

翻译自：https://stackoverflow.com/questions/66992763 2021-04-07T19:16:00.760

61 次

输入 DataFrame X 包含列名 A

一个
类风湿关节炎
母乳喂养，独家
失智
髋部骨折
HIV爱滋病

DataFrame Y 包含列名 B

乙
类风湿关节炎
高血压前期
肝细胞癌
HIV爱滋病
白血病，骨髓
白血病，骨髓，急性
母乳喂养！独家

所需的输出 如果 X[A] 匹配到 Y[B] 它应该只是替换它并且 DataFrame Y 应该是。

乙
类风湿关节炎
HIV爱滋病
母乳喂养，独家

并且 X[A] 与 Y[B] 的不匹配创建了一个空的数据帧 non_match ，它应该看起来像这样

非
高血压前期
肝细胞癌
白血病，骨髓
白血病，骨髓，急性

regex = r"[a-zA-Z]"
regex1 = r'\W+'
for x in master_condition.iteritems():
  for y in sample.iteritems():
    if (sorted(re.split(regex1, str(x) , re.MULTILINE | re.IGNORECASE)) == 
                sorted(re.split(regex1, str(y), re.MULTILINE | 
                   re.IGNORECASE))):
  sample["Conditions"] = 
          sample["Conditions"].replace(master_conditions["TA_Conditions"])
else:
  print("Hello")



B["B"] = B["B"].str.replace(r"[A-Za-z]", "", regex=True)\
           .str.lower()\
           .str.strip()  # if there is trailing spaces

B_matched = B.merge(A, how="inner", left_on="B", right_on="A")[["B"]]

 B_non = B[~B["B"].isin(B_matched["B"])].rename(columns={"B": "non"})

这两个代码都不起作用？请帮我另一个代码

python - 我想将两个 DataFrame 与其列字符串进行比较。Str 比较应该使用 split() 以不同的方式完成，包括所有规范字符

0 回答 0

Related

Reference