1

当我使用几列(['Symbol','Year','Month','Day'])对我的 DataFrame 进行排序时,生成的 DataFrame 是按Symbol > Year > Month不是按以下方式排序的Day

In [1]: df = pd.DataFrame({'Symbol': {79: 'F', 81: 'F', 82: 'F', 83: 'F', 84: 'F', 85: 'F', 86: 'F', 87: 'F', 89: 'F'}, 'Shares': {79: 100, 81: 100, 82: 100, 83: 100, 84: 100, 85: 100, 86: 100, 87: 100, 89: 100}, 'Month': {79: '08', 81: '08', 82: '08', 83: '08', 84: '08', 85: '08', 86: '08', 87: '08', 89: '09'}, 'Year': {79: '2008', 81: '2008', 82: '2008', 83: '2008', 84: '2008', 85: '2008', 86: '2008', 87: '2008', 89: '2008'}, 'Action': {79: 'Sell', 81: 'Sell', 82: 'Buy', 83: 'Sell', 84: 'Buy', 85: 'Sell', 86: 'Buy', 87: 'Sell', 89: 'Sell'}, 'Day': {79: 2L, 81: 4L, 82: '06', 83: 11L, 84: '13', 85: 18L, 86: '18', 87: 23L, 89: 22L}})

In [2]: df
Out[2]:
   Action Day Month  Shares Symbol  Year
79   Sell   2    08     100      F  2008
81   Sell   4    08     100      F  2008
82    Buy  06    08     100      F  2008
83   Sell  11    08     100      F  2008
84    Buy  13    08     100      F  2008
85   Sell  18    08     100      F  2008
86    Buy  18    08     100      F  2008
87   Sell  23    08     100      F  2008
89   Sell  22    09     100      F  2008

In [3]: df.sort(['Symbol','Year','Month','Day'])
Out[3]:
   Action Day Month  Shares Symbol  Year
79   Sell   2    08     100      F  2008
81   Sell   4    08     100      F  2008
83   Sell  11    08     100      F  2008
85   Sell  18    08     100      F  2008
87   Sell  23    08     100      F  2008
82    Buy  06    08     100      F  2008
84    Buy  13    08     100      F  2008
86    Buy  18    08     100      F  2008
89   Sell  22    09     100      F  2008

为什么没有sort按预期工作?

4

1 回答 1

1

它没有像您预期的那样工作,因为 Days 存储为混合类型(字符串和长整数),并且因为字符串在 python 中“大于”数字(排序看起来像是出乎意料的行为)。

apply您可以通过-ing将此列转换为整数int

df['Day'] = df['Day'].apply(int)

我也会考虑在月份和年份执行此操作,因为在您的 DataFrame 中这些是字符串(也许作为 int 更有意义):

df['Mo.'] = df['Mo.'].apply(int)
df['Year'] = df['Year'].apply(int)

然后你可以sort白天:

In [11]: df.sort(['Day'])
Out[11]:
   Indx  Year  Mo.  Day Sym Action  Shares
0    79  2008    8    2   F   Sell     100
1    81  2008    8    4   F   Sell     100
5    82  2008    8    6   F    Buy     100
2    83  2008    8   11   F   Sell     100
6    84  2008    8   13   F    Buy     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
8    89  2008    9   22   F   Sell     100
4    87  2008    8   23   F   Sell     100

或使用多列排序:

In [12]: df.sort(['Mo.', 'Day'])
Out[12]:
   Indx  Year  Mo.  Day Sym Action  Shares
0    79  2008    8    2   F   Sell     100
1    81  2008    8    4   F   Sell     100
5    82  2008    8    6   F    Buy     100
2    83  2008    8   11   F   Sell     100
6    84  2008    8   13   F    Buy     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
4    87  2008    8   23   F   Sell     100
8    89  2008    9   22   F   Sell     100

In [13]: df.sort(['Day', 'Mo.'])
Out[13]:
   Indx  Year  Mo.  Day Sym Action  Shares
0    79  2008    8    2   F   Sell     100
1    81  2008    8    4   F   Sell     100
5    82  2008    8    6   F    Buy     100
2    83  2008    8   11   F   Sell     100
6    84  2008    8   13   F    Buy     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
8    89  2008    9   22   F   Sell     100
4    87  2008    8   23   F   Sell     100

并与ascending论点:

In [14]: df.sort(['Mo.', 'Day'], ascending=[True, False])
Out[14]:
   Indx  Year  Mo.  Day Sym Action  Shares
4    87  2008    8   23   F   Sell     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
6    84  2008    8   13   F    Buy     100
2    83  2008    8   11   F   Sell     100
5    82  2008    8    6   F    Buy     100
1    81  2008    8    4   F   Sell     100
0    79  2008    8    2   F   Sell     100
8    89  2008    9   22   F   Sell     100

...将按预期工作。

于 2013-04-12T16:50:27.303 回答