python - 用于查找最大值和列名的与此 SQL 等效的 Python/pandas 是什么？

Question

什么是 python/pandas 等价的MAX(variable)语句：

SELECT ID, Name FROM Table5 WHERE 
Friend_count = (SELECT MAX(friend_count) FROM Table5);

（我正在尝试学习如何在 Python 中做一些我通常会在 SQL 中做的事情。我认为我可以在 pandas 中做到这一点，但没有找到方法。）

score 2 · Accepted Answer

在你的上使用该idxmax()方法怎么样DataFrame？

import numpy as np
import pandas as pd
from ggplot import meat

我在这里使用ggplot中的肉类数据集。

In [18]: meat
Out[18]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 827 entries, 0 to 826
Data columns (total 8 columns):
date               827  non-null values
beef               827  non-null values
veal               827  non-null values
pork               827  non-null values
lamb_and_mutton    827  non-null values
broilers           635  non-null values
other_chicken      143  non-null values
turkey             635  non-null values
dtypes: datetime64[ns](1), float64(7)

假设您要查找beef产量最高的一行或多行。

In [36]: meat.beef.max()
Out[36]: 2512.0

在 SQL 中，您可能会这样做

SELECT 
    * 
FROM 
    meat 
WHERE
    beef = (SELECT max(beef) FROM meat) ;

使用 pandas，您可以使用 idxmax 完成此操作，如下所示：

In [35]: meat.ix[meat.beef.idxmax()]
Out[35]:
date               2002-10-01 00:00:00
beef                              2512
veal                              18.7
pork                              1831
lamb_and_mutton                   19.7
broilers                        2953.3
other_chicken                     50.7
turkey                           525.9
Name: 705, dtype: object

idxmax非常棒，如果您的数据是基于日期或时间的，它也应该可以工作。

In [42]: ts = meat.set_index(['date'])

In [43]: ts.beef.max()
Out[43]: 2512.0

In [44]: ts.beef.idxmax()
Out[44]: Timestamp('2002-10-01 00:00:00', tz=None)

In [45]: ts.ix[ts.beef.idxmax()]
Out[45]:
beef               2512.0
veal                 18.7
pork               1831.0
lamb_and_mutton      19.7
broilers           2953.3
other_chicken        50.7
turkey              525.9
Name: 2002-10-01 00:00:00, dtype: float64

score 1 · Accepted Answer

假设您有一个 Person 类。它有一个属性friend_count。这是一个找到朋友最多的人的示例...

import operator

class Person(object):
    def __init__(self, friend_count):
        self.friend_count = friend_count

people = [Person(x) for x in [0, 1, 5, 10, 3]]
popular_person = max(people, key=operator.attrgetter('friend_count'))
print popular_person.friend_count # prints 10

score 1 · Accepted Answer

熊猫系列/列上有一个max 方法：

In [1]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [2]: df
Out[2]: 
   A  B
0  1  2
1  3  4

选择列：

In [3]: s = df.A  # same as df['A']

并取最大值：

In [4]: s.max()
Out[4]: 3

您还可以在 DataFrame 上取最大值：

In [5]: df.max() # over the columns
Out[5]: 
A    3
B    4
dtype: int64

In [6]: df.max(axis=1) # over the rows
Out[6]: 
0    2
1    4
dtype: int64

要返回具有最大值的所有行，您应该使用掩码：

In [7]: df.A == df.A.max()
Out[7]: 
0    False
1     True
Name: A, dtype: bool

In [8]: df[df.A == df.A.max()]
Out[8]: 
   A  B
1  3  4

score 0 · Accepted Answer

为了从 Python 中的列表中获取最大值，只需使用该max函数。这同样适用于min。请参阅此处的文档。如果您希望基于对象的属性来做，那么您可以使用列表推导，例如max(person.age for person in people).

如果您希望获得年龄最大的人，那么您可以使用列表推导，例如

oldest_age = max(person.age for person in people)
people_with_max_age = [person for person in people if people.age == oldest_age]

与 SQL 不同，您很少希望只收集对象的 n 个属性 - 将它们附加在对象上并收集您想要的对象会更有用。如果您想实现这一点，请参阅@FogleBird 的回答。

python - 用于查找最大值和列名的与此 SQL 等效的 Python/pandas 是什么？

4 回答 4

Related

Reference