python - 如何在 Python Pandas 中用另一个数据框替换和添加数据框元素？

Question

假设我有两个数据框 'df_a' & 'df_b' ，它们都具有相同的索引结构和列，但一些内部数据元素是不同的：

>>> df_a
           sales cogs
STK_ID QT           
000876 1   100  100
       2   100  100
       3   100  100
       4   100  100
       5   100  100
       6   100  100
       7   100  100

>>> df_b
           sales cogs
STK_ID QT           
000876 5    50   50
       6    50   50
       7    50   50
       8    50   50
       9    50   50
       10   50   50

现在我想用具有相同（索引，列）坐标的 df_b 元素替换 df_a 的元素，并附加 df_b 的（索引，列）坐标超出 df_a 范围的元素。就像将补丁 'df_b' 添加到 'df_a' ：

>>> df_c = patch(df_a,df_b)
           sales cogs
STK_ID QT           
000876 1   100  100
       2   100  100
       3   100  100
       4   100  100
       5    50   50
       6    50   50
       7    50   50
       8    50   50
       9    50   50
       10   50   50

如何编写'patch(df_a,df_b)'函数？

score 2 · Accepted Answer

2

试试这个：

df_c = df_a.reindex(df_a.index | df_b.index)
df_c.ix[df_b.index] = df_b

于 2012-08-31T15:16:22.313 回答

score 2 · Accepted Answer

要使用来自另一个数据帧的值（甚至是整行）填补一个数据帧中的空白，请查看df.combine_first()内置方法。

In [34]: df_b.combine_first(df_a)
Out[34]: 
           sales  cogs
STK_ID QT             
000876 1     100   100
       2     100   100
       3     100   100
       4     100   100
       5      50    50
       6      50    50
       7      50    50
       8      50    50
       9      50    50
       10     50    50

score 1 · Accepted Answer

类似于 BrenBarn 的答案，但更灵活：

# reindex both to union of indices
df_ar = df_a.reindex(df_a.index | df_b.index)
df_br = df_b.reindex(df_a.index | df_b.index)

# replacement criteria can be put in this lambda function
combiner = lambda: x, y: np.where(y < x, y, x)
df_c = df_ar.combine(df.br, combiner)

score 0 · Accepted Answer

我在同一个问题上苦苦挣扎，之前答案中的代码在我的数据框中不起作用。他们有 2 个索引列，并且重新索引操作会在奇怪的地方产生 NaN 值（如果有人愿意调试它，我会发布数据框内容）。

我找到了一个替代解决方案。我正在恢复这个线程，希望这对其他人有用：

# concatenate df_a and df_b
df_c = concat([dfbd,dfplanilhas])

# clears the indexes (turns the index columns into regular dataframe columns)
df_c.reset_index(inplace='True')

# removes duplicates keeping the last occurence (hence updating df_a with values from df_b)
df_c.drop_duplicates(subset=['df_a','df_b'], take_last='True', inplace='True')

不是一个非常优雅的解决方案，但似乎有效。

我希望 df.update 很快得到一个 join='outer' 选项......

python - 如何在 Python Pandas 中用另一个数据框替换和添加数据框元素？

4 回答 4

Related

Reference