0

使用以下数据框...

                     line_date line_track  line_race  c1pos
 horse_name                                                
 Grand Cicero       2013-03-10         GP          9      9
 Clever Story       2013-09-13        BEL          7      7
 Distorted Dream    2013-10-04        BEL          4      2
 Distorted Dream    2013-09-13        BEL          7      5
 Distorted Dream    2013-04-27        BEL          6      2
 Distorted Dream    2012-10-24        BEL          4      2
 Distorted Dream    2012-09-12        BEL          2      3
 Distorted Dream    2012-06-30        BEL          8      4
 Distorted Dream    2012-06-09        BEL          2      4
 Mr. O'Leary        2013-10-13        BEL          5      5
 Mr. O'Leary        2013-08-29        SAR          7      6
 Mr. O'Leary        2013-05-27        BEL          6      5
 In the Dark        2013-10-13        BEL          5      7
 In the Dark        2013-09-22        BEL          5      7
 In the Dark        2013-08-03        SAR          2      7
 In the Dark        2012-11-24        AQU          3      7
 In the Dark        2012-10-18        BEL          6      6
 Bred to Boss       2013-10-26        PRX          3      5
 Bred to Boss       2013-10-06        PRX          6      3
 Bred to Boss       2012-08-18        SAR          4      1

...索引设置为horse_name. 我需要将这些中的每一个“修剪”到一定数量。例如,“扭曲的梦想”有七条记录。我需要将所有超过三个记录的记录减少到三个,因此它会生成一个如下所示的 DataFrame。有没有一种快速简单的方法来做到这一点?

                     line_date line_track  line_race  c1pos
 horse_name                                                
 Grand Cicero       2013-03-10         GP          9      9
 Clever Story       2013-09-13        BEL          7      7
 Distorted Dream    2013-10-04        BEL          4      2
 Distorted Dream    2013-09-13        BEL          7      5
 Distorted Dream    2013-04-27        BEL          6      2
 Mr. O'Leary        2013-10-13        BEL          5      5
 Mr. O'Leary        2013-08-29        SAR          7      6
 Mr. O'Leary        2013-05-27        BEL          6      5
 In the Dark        2013-10-13        BEL          5      7
 In the Dark        2013-09-22        BEL          5      7
 In the Dark        2013-08-03        SAR          2      7
 Bred to Boss       2013-10-26        PRX          3      5
 Bred to Boss       2013-10-06        PRX          6      3
 Bred to Boss       2012-08-18        SAR          4      1
4

1 回答 1

1

像往常一样,groupby救援!阅读这些文档是值得的,因为有很多有用的技巧可供大家借鉴。

>>> df.groupby(level=0, sort=False, as_index=False).head(3)
                  line_date line_track  line_race  c1pos
horse_name                                              
Grand Cicero     2013-03-10         GP          9      9
Clever Story     2013-09-13        BEL          7      7
Distorted Dream  2013-10-04        BEL          4      2
Distorted Dream  2013-09-13        BEL          7      5
Distorted Dream  2013-04-27        BEL          6      2
Mr. O'Leary      2013-10-13        BEL          5      5
Mr. O'Leary      2013-08-29        SAR          7      6
Mr. O'Leary      2013-05-27        BEL          6      5
In the Dark      2013-10-13        BEL          5      7
In the Dark      2013-09-22        BEL          5      7
In the Dark      2013-08-03        SAR          2      7
Bred to Boss     2013-10-26        PRX          3      5
Bred to Boss     2013-10-06        PRX          6      3
Bred to Boss     2012-08-18        SAR          4      1

或者,如果你想要最后 3 个:

>>> df.groupby(level=0, sort=False, as_index=False).tail(3)

(这sort=False只是为了保留原始的马顺序;如果您不在乎,可以放弃它。)

您还可以对line_date列进行排序(将其转换为datetime第一个更安全,但YYYY-MM-DD字符串将按原样正确排序)并使用相同的head/tail方法按时间顺序选择前三个或后三个。

于 2013-11-10T00:07:10.330 回答