2

我有一个大DataFrame的如下:

            count   mean  median    min    max   std
datet                                               
2001-05-16     17    NaN     NaN    NaN    NaN   NaN
2001-05-17     24   8.28    8.27   8.15   8.46  0.09
2001-05-18     24   8.41    8.31   8.18   8.85  0.19
2001-05-19     24  10.44   10.64   9.03  10.98  0.60
2001-05-20     24  10.53   10.56   9.98  10.92  0.28
2001-05-21     24  10.28   10.31   9.90  10.66  0.23
2001-05-22     24  10.40   10.42  10.17  10.67  0.17
2001-05-23     24  10.04   10.03   9.87  10.17  0.08
2001-05-24     24   9.63    9.66   9.41   9.88  0.15
2001-05-25     24   9.21    9.22   9.01   9.41  0.11

如何DataFrame根据日期'2001-05-20'之前或之后将其分成两个小部分?如下所示:

df1:
         count   mean  median    min    max   std
datet                                               
2001-05-16     17    NaN     NaN    NaN    NaN   NaN
2001-05-17     24   8.28    8.27   8.15   8.46  0.09
2001-05-18     24   8.41    8.31   8.18   8.85  0.19
2001-05-19     24  10.44   10.64   9.03  10.98  0.60
2001-05-20     24  10.53   10.56   9.98  10.92  0.28

df2:
     count   mean  median    min    max   std
datet                                               
2001-05-21     24  10.28   10.31   9.90  10.66  0.23
2001-05-22     24  10.40   10.42  10.17  10.67  0.17
2001-05-23     24  10.04   10.03   9.87  10.17  0.08
2001-05-24     24   9.63    9.66   9.41   9.88  0.15
2001-05-25     24   9.21    9.22   9.01   9.41  0.11
4

2 回答 2

3

对于单个拆分前/拆分后,我认为按布尔标准分组是最直接的方法。

In [1]: df = DataFrame(np.random.randn(10),
                       index=pd.date_range('2001-05-16', '2001-05-25'))

In [2]: grouper = df.groupby(df.index < pd.Timestamp('2001-05-21'))

In [3]: before, after = grouper.get_group(True), grouper.get_group(False)

In [4]: before
Out[4]: 
               0
2001-05-16  2.560516
2001-05-17 -2.207314
2001-05-18  0.646882
2001-05-19  0.660611
2001-05-20  0.437303

并且after出来也是正确的。有人可以改进我的In [3]吗?

于 2013-03-14T16:38:37.383 回答
3

0.11-dev(.ix 将等效地工作)

In [16]: df.loc[:'20010520']
Out[16]: 
                   0
2001-05-16  0.105445
2001-05-17  1.660771
2001-05-18  0.485668
2001-05-19 -0.102616
2001-05-20 -0.228228

In [17]: df.loc['20010521':]
Out[17]: 
                   0
2001-05-21 -0.024324
2001-05-22 -1.004362
2001-05-23  2.342225
2001-05-24  1.124695
2001-05-25 -0.291302

或(ix 也可以在这里工作,这更明确)

 In [27]: i = df.index.get_loc('20010520')

In [28]: df.iloc[:i+1]
Out[28]: 
                   0
2001-05-16  0.105445
2001-05-17  1.660771
2001-05-18  0.485668
2001-05-19 -0.102616
2001-05-20 -0.228228

In [29]: df.iloc[i+1:]
Out[29]: 
                   0
2001-05-21 -0.024324
2001-05-22 -1.004362
2001-05-23  2.342225
2001-05-24  1.124695
2001-05-25 -0.291302
于 2013-03-14T17:01:10.393 回答