0

这是我的数据框,我有一个关于时间和 ID 的多索引。

               +------+-------+----------+-----------+
               | col1 | col2  | col3     | col4      |
+-------+------+------+-------+----------+-----------+
| ID    | t    |      |       |          |           |
+-------+------+------+-------+----------+-----------+
| id1   | t1   | 10   | nan   |    nan   |    1      |
| id1   | t2   | 10   | 110   |      1   |    nan    |
| id1   | t3   | 12   | nan   |    nan   |    nan    |
| id2   | t1   | 12   | 109   |     15   |    1      |
| id2   | t4   | 12   | 109   |    nan   |    1      |
| id2   | t7   | 20   | nan   |    nan   |    nan    |
+-------+------+------+-------+----------+-----------+

是否可以仅在 col3 和 col4 上进行多索引 fwd 填充?

               +------+-------+----------+-----------+
               | col1 | col2  | col3     | col4      |
+-------+------+------+-------+----------+-----------+
| ID    | t    |      |       |          |           |
+-------+------+------+-------+----------+-----------+
| id1   | t1   | 10   | nan   |    nan   |     1     |
| id1   | t2   | 10   | 110   |    1     |     1     |
| id1   | t3   | 12   | nan   |    1     |     1     |
| id2   | t1   | 12   | 109   |    15    |     1     |
| id2   | t4   | 12   | 109   |    15    |     1     |
| id2   | t7   | 20   | nan   |    15    |     1     |
+-------+------+------+-------+----------+-----------+

到目前为止我已经尝试过:

df[['col3','col4']].ffill()  #how to account for the multiindex?
df[['col3','col4']].fillna(df.groupby(['ID','t'])[['col3', 'col4']].ffill()) #did not work
df.reindex(['ID','t'], method='ffill') #this is probably incomplete, and I got 'expected Tuple, got str'
4

2 回答 2

0

IIUC 用途:

df.update(df.groupby(level=0)[['col3', 'col4']].ffill())
print (df)
        col1   col2  col3  col4
ID  t                          
id1 t1    10    NaN   NaN   1.0
    t2    10  110.0   1.0   1.0
    t3    12    NaN   1.0   1.0
id2 t1    12  109.0  15.0   1.0
    t4    12  109.0  15.0   1.0
    t7    20    NaN  15.0   1.0
于 2021-09-10T05:36:02.730 回答
0

我认为您与第二次尝试很接近。我假设您只想按多索引的第一级分组ID,因为无论多索引的第二级如何,您都希望列值向前填充t。我们可以groupby在多索引的第一级使用 a ,如下所示:

df1 = df.groupby(level=0)[['col3', 'col4']].ffill()

要加入这些结果:

df.drop(['col3', 'col4'], axis=1, inplace=True)
df = df.join(df1)
于 2021-09-10T03:54:05.830 回答