2

鉴于以下数据:

               Sum  amount_net  amount_gross    symbol  Date_Time
ts                  
7/29/2013 2:17  -68 755,101 -755,101        A   7/29/2013 2:17
7/29/2013 2:17  -21 251,945 -251,945        B   7/29/2013 2:17
7/29/2013 2:16  -1  2,200   -2,200          C   7/29/2013 2:16
7/29/2013 2:17  -5  11,000  -11,000         C   7/29/2013 2:17
7/29/2013 2:08  -1  5,384   -5,384          D   7/29/2013 2:08
7/29/2013 2:09  -3  16,151  -16,151         D   7/29/2013 2:09
7/29/2013 2:13  1   5,384   5,384           D   7/29/2013 2:13
7/29/2013 2:02  20  70,000  70,000          F   7/29/2013 2:02
7/29/2013 2:03  22  77,000  77,000          F   7/29/2013 2:03
7/29/2013 2:04  18  63,000  63,000          F   7/29/2013 2:04
7/29/2013 2:05  15  52,500  52,500          F   7/29/2013 2:05
7/29/2013 2:08  15  52,500  52,500          F   7/29/2013 2:08
7/29/2013 2:09  8   28,000  28,000          F   7/29/2013 2:09
7/29/2013 2:10  22  77,000  77,000          F   7/29/2013 2:10
7/29/2013 2:11  22  77,000  77,000          F   7/29/2013 2:11
7/29/2013 2:12  12  42,000  42,000          F   7/29/2013 2:12
7/29/2013 2:13  5   17,500  17,500          F   7/29/2013 2:13
7/29/2013 2:14  30  105,000 105,000         F   7/29/2013 2:14
7/29/2013 2:15  35  122,500 122,500         F   7/29/2013 2:15
7/29/2013 2:16  35  122,500 122,500         F   7/29/2013 2:16

我想在该符号的最长时间返回每个符号的总和、amount_net 和amount_gross。即我想得到:

symbol  Time           Sum  amount_net  amount_gross
A   7/29/2013 2:17  -68 755,101        -755,101
B   7/29/2013 2:17  -21 251,945        -251,945
C   7/29/2013 2:17  -5  11,000          -11,000
D   7/29/2013 2:13  1   5,384             5,384
F   7/29/2013 2:16  35  122,500         122,500
4

2 回答 2

2

按时间顺序排序,按符号分组,然后从每组中取出最后一个(即“最大时间”)元素。

In [28]: df.sort('Date_Time').groupby('symbol').last()
Out[28]: 
                 Date_Time  Sum  amount_net  amount_gross
symbol                                                   
A      2013-07-29 02:17:00  -68      755101       -755101
B      2013-07-29 02:17:00  -21      251945       -251945
C      2013-07-29 02:17:00   -5       11000        -11000
D      2013-07-29 02:13:00    1        5384          5384
F      2013-07-29 02:16:00   35      122500        122500

请参阅@Andy 关于将数字解析为整数的评论。

于 2013-09-13T14:12:48.593 回答
0

只需 groupby 符号和总和:

In [11]: df1.groupby('symbol').sum()
Out[11]:
        Sum  amount_net  amount_gross
symbol
A       -68      755101       -755101
B       -21      251945       -251945
C        -6       13200        -13200
D        -3       26919        -16151
F       259      906500        906500

注意: atm 它看起来像amount_net并且amount_gross没有被正确解析为整数,而是它们是字符串,但您可以使用以下方法进行转换:

df1[['amount_net', 'amount_gross']] = df1[['amount_net', 'amount_gross']].applymap(lambda x: int(x.replace(',', '')))
于 2013-09-13T13:16:51.600 回答