python - 使用 groupby apply 与 resample apply 时的 bin 有什么区别？

Question

这是一个有点宽泛的话题，但我会尝试将其缩减为一些具体的问题。

我注意到两者之间的区别resample，groupby我很想了解。这是一些每小时的时间序列数据：

In[]:
import pandas as pd

dr = pd.date_range('01-01-2020 8:00', periods=10, freq='H')
df = pd.DataFrame({'A':range(10),
                   'B':range(10,20),
                   'C':range(20,30)}, index=dr)
df

Out[]:
                     A   B   C
2020-01-01 08:00:00  0  10  20
2020-01-01 09:00:00  1  11  21
2020-01-01 10:00:00  2  12  22
2020-01-01 11:00:00  3  13  23
2020-01-01 12:00:00  4  14  24
2020-01-01 13:00:00  5  15  25
2020-01-01 14:00:00  6  16  26
2020-01-01 15:00:00  7  17  27
2020-01-01 16:00:00  8  18  28
2020-01-01 17:00:00  9  19  29

我可以使用 or 对数据进行下采样groupby（freq pandas.Grouper这resample似乎是更典型的做法）：

g = df.groupby(pd.Grouper(freq='2H'))
r = df.resample(rule='2H')

我的印象是，这两者本质上是一回事（如果我错了，请纠正我，但resample重新命名了groupby）？但是我发现，当使用apply每个分组对象的方法时，您可以索引“DataFrameGroupBy”对象中的特定列，g但不能索引“Resampler”对象r：

def foo(d):
    return(d['A'] - d['B'] + 2*d['C'])

In[]:
g.apply(foo)

Out[]:
2020-01-01 08:00:00  2020-01-01 08:00:00    30
                     2020-01-01 09:00:00    32
2020-01-01 10:00:00  2020-01-01 10:00:00    34
                     2020-01-01 11:00:00    36
2020-01-01 12:00:00  2020-01-01 12:00:00    38
                     2020-01-01 13:00:00    40
2020-01-01 14:00:00  2020-01-01 14:00:00    42
                     2020-01-01 15:00:00    44
2020-01-01 16:00:00  2020-01-01 16:00:00    46
                     2020-01-01 17:00:00    48
dtype: int64

In[]:
r.apply(foo)

Out[]:
#long multi-Exception error stack ending in:
KeyError: 'A'

看起来“看到”的数据d在apply每种情况下都不同，如下所示：

def bar(d):
    print(d)

In[]:
g.apply(bar)

Out[]:
                     A   B   C
2020-01-01 08:00:00  0  10  20
2020-01-01 09:00:00  1  11  21
... #more DataFrames corresponding to each bin

In[]:
r.apply(bar)

Out[]:
2020-01-01 08:00:00    0
2020-01-01 09:00:00    1
Name: A, dtype: int64
2020-01-01 10:00:00    2
2020-01-01 11:00:00    3
Name: A, dtype: int64
... #more Series, first the bins for column "A", then "B", then "C"

但是，如果您只是简单地遍历 Resampler 对象，您将获得作为 DataFrames 的 bin，这看起来类似于groupby：

In[]:
for i, d in r:
    print(d)

Out[]:
                    A   B   C
2020-01-01 08:00:00  0  10  20
2020-01-01 09:00:00  1  11  21
                     A   B   C
2020-01-01 10:00:00  2  12  22
2020-01-01 11:00:00  3  13  23
                     A   B   C
2020-01-01 12:00:00  4  14  24
2020-01-01 13:00:00  5  15  25
                     A   B   C
2020-01-01 14:00:00  6  16  26
2020-01-01 15:00:00  7  17  27
                     A   B   C
2020-01-01 16:00:00  8  18  28
2020-01-01 17:00:00  9  19  29

迭代 DataFrameGroupBy 对象时，打印输出是相同的。

我的问题基于上述？

您可以使用resample和访问特定列apply吗？我以为我有这样做的代码，但现在我想我错了。
为什么resample apply每个 bin 的每列而不是每个 bin 的 Series 工作DataFrames？

任何关于这里发生的事情的一般性评论，或者是否应该鼓励或阻止这种模式，也将不胜感激。谢谢！

python - 使用 groupby apply 与 resample apply 时的 bin 有什么区别？

0 回答 0

Related

Reference