python - Python Pandas groupby forloop & Idxmax

Question

我有一个必须按三个级别分组的 DataFrame，然后将返回最高值。每天每个唯一值都有一个回报，我想找到最高回报和细节。

data.groupby(['Company','Product','Industry'])['ROI'].idxmax()

回报将表明：

Target   - Dish Soap - House       had a 5% ROI on 9/17
Best Buy - CDs       - Electronics had a 3% ROI on 9/3

是最高的。

以下是一些示例数据：

+----------+-----------+-------------+---------+-----+
| Industry | Product   | Industry    | Date    | ROI |
+----------+-----------+-------------+---------+-----+
| Target   | Dish Soap | House       | 9/17/13 | 5%  |
| Target   | Dish Soap | House       | 9/16/13 | 2%  |
| BestBuy  | CDs       | Electronics | 9/1/13  | 1%  |
| BestBuy  | CDs       | Electroincs | 9/3/13  | 3%  |
| ...

不确定这是 for 循环还是使用 .ix。

score 6 · Accepted Answer

我认为，如果我理解正确，您可以使用groupbyand收集 Series 中的索引值idxmax()，然后从dfusing 中选择这些行loc：

idx =  data.groupby(['Company','Product','Industry'])['ROI'].idxmax()
data.loc[idx]

另一种选择是使用reindex：

data.reindex(idx)

在我碰巧有一个（不同的）数据框上，它似乎reindex可能是更快的选择：

In [39]: %timeit df.reindex(idx)
10000 loops, best of 3: 121 us per loop

In [40]: %timeit df.loc[idx]
10000 loops, best of 3: 147 us per loop

python - Python Pandas groupby forloop & Idxmax

1 回答 1

Related

Reference