python - pandas：在 DataFrame 中组合两列

Question

我有一个DataFrame包含多列的熊猫：

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51
Data columns:
foo                   11516  non-null values
bar                   228381  non-null values
Time_UTC              239897  non-null values
dtstamp               239897  non-null values
dtypes: float64(4), object(1)

wherefoo和barare 包含相同数据但名称不同的列。有没有办法将组成的行移动foo到bar中，理想情况下同时保持的名称bar？

最后，DataFrame 应显示为：

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51
Data columns:
bar                   239897  non-null values
Time_UTC              239897  non-null values
dtstamp               239897  non-null values
dtypes: float64(4), object(1)

那就是组成 bar 的 NaN 值被替换为foo.

score 23 · Accepted Answer

您可以直接使用fillna并将结果分配给列'bar'

df['bar'].fillna(df['foo'], inplace=True)
del df['foo']

一般示例：

import pandas as pd
#creating the table with two missing values
df1 = pd.DataFrame({'a':[1,2],'b':[3,4]}, index = [1,2])
df2 = pd.DataFrame({'b':[5,6]}, index = [3,4])
dftot = pd.concat((df1, df2))
print dftot
#creating the dataframe to fill the missing values
filldf = pd.DataFrame({'a':[7,7,7,7]})

#filling 
print dftot.fillna(filldf)

score 22 · Accepted Answer

尝试这个：

pandas.concat([df['foo'].dropna(), df['bar'].dropna()]).reindex_like(df)

如果您希望该数据成为新列bar，只需将结果分配给df['bar'].

score 6 · Accepted Answer

更现代的 pandas 版本（至少从 0.12 开始）具有 DataFrame 和 Series 对象的combine_first()和update()方法。例如，如果您的 DataFrame 被调用df，您将执行以下操作：

df.bar.combine_first(df.foo)

这只会改变列的 Nan 值bar以匹配foo列，并且会就地进行。bar要用 in覆盖非 Nan 值foo，您可以使用该update()方法。

score 5 · Accepted Answer

另一种选择，使用.apply()框架上的方法。您可以根据现有数据重新分配列...

import pandas as pd
import numpy as np

# get your data into a dataframe

# replace content in "bar" with "foo" if "bar" is null
df["bar"] = df.apply(lambda row: row["foo"] if row["bar"] == np.NaN else row["bar"], axis=1) 

# note: change 'np.NaN' with null values you have like an empty string

score 2 · Accepted Answer

2

你也可以这样做numpy。

df['bar'] = np.where(pd.isnull(df['bar']),df['foo'],df['bar'])

于 2016-12-01T03:51:41.613 回答

python - pandas：在 DataFrame 中组合两列

5 回答 5

Related

Reference