python - pandas combine_first 与特定的索引列？

Question

我正在尝试在 pandas 中加入两个数据框以具有以下行为：我想加入指定的列，但有它所以冗余列不会添加到数据框中。这类似于combine_firstexceptcombine_first似乎不采用索引列可选参数。例子：

# combine df1 and df2 based on "id" column
df1 = pandas.merge(df2, how="outer", on=["id"])

上面的问题是除“id”之外的 df1/df2 共有的列将被添加两次（带_x,_y前缀）到 df1。我该怎么做：

# Do outer join from df2 to df1, matching items by "id" but not adding
# columns that are redundant (df1 takes precedence if the values disagree)
df1.combine_first(df2, on=["id"])

如何才能做到这一点？

score 1 · Accepted Answer

如果您尝试在排除任何冗余列的同时将列合并到，则以下内容应该有效df2。df1

df1.set_index("id", inplace=True)
df2.set_index("id", inplace=True)
df3 = df1.merge(df2.ix[:,df2.columns-df1.columns], left_index=True, right_index=True, how="outer")

但是，这显然不会使用来自的值更新任何值，因为它只会引入非冗余列。但是，既然您说将优先考虑任何不同意的价值观，也许这会奏效？df1df2df1

python - pandas combine_first 与特定的索引列？

1 回答 1

Related

Reference