python - 计算 Pandas 时间序列上的每日事件

Question

嗨，我有一个时间序列，想计算我每天有多少事件（即一天内表格中的行）。我想使用的命令是：

ts.resample('D', how='count')

但我想“计数”不是时间序列的有效聚合函数。

只是为了澄清，这是数据框的示例：

0   2008-02-22 03:43:00
1   2008-02-22 03:43:00
2   2010-08-05 06:48:00
3   2006-02-07 06:40:00
4   2005-06-06 05:04:00
5   2008-04-17 02:11:00
6   2012-05-12 06:46:00
7   2004-05-17 08:42:00
8   2004-08-02 05:02:00
9   2008-03-26 03:53:00
Name: Data_Hora, dtype: datetime64[ns]

这是我得到的错误：

ts.resample('D').count()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-86643e21ce18> in <module>()
----> 1 ts.resample('D').count()

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
    255     def resample(self, rule, how=None, axis=0, fill_method=None,
    256                  closed=None, label=None, convention='start',
--> 257                  kind=None, loffset=None, limit=None, base=0):
    258         """
    259         Convenience method for frequency conversion and resampling of regular

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in resample(self, obj)
     98             return obj
     99         else:  # pragma: no cover
--> 100             raise TypeError('Only valid with DatetimeIndex or PeriodIndex')
    101 
    102         rs_axis = rs._get_axis(self.axis)

TypeError: Only valid with DatetimeIndex or PeriodIndex

这可以通过使用 set_index 将 datetime 列转换为索引来解决。但是，在我这样做之后，我仍然收到以下错误：

DataError: No numeric types to aggregate

因为我的 Dataframe 没有数字列。

但我只想数行！SQL 中的简单“select count(*) group by ...”。

score 6 · Accepted Answer

为了让它工作，在删除索引为 NaT 的行之后：

df2 = df[df.index!=pd.NaT]

我不得不添加一列：

df2['n'] = 1

然后只计算该列：

df2.n.resample('D', how="sum")

然后我可以通过以下方式可视化数据：

plot(df2.n.resample('D', how="sum"))

score 1 · Accepted Answer

In [104]: df = DataFrame(1,index=date_range('20130101 9:01',freq='h',periods=1000),columns=['A'])

In [105]: df
Out[105]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2013-01-01 09:01:00 to 2013-02-12 00:01:00
Freq: H
Data columns (total 1 columns):
A    1000  non-null values
dtypes: int64(1)

In [106]: df.resample('D').count()
Out[106]: 
A    43
dtype: int64

score 1 · Accepted Answer

您可以使用值计数和重新采样，使用单线来完成此操作。

假设您的 DataFrame 名为df：

df.index.value_counts().resample('D', how='sum')

如果 datetime 不是您的索引，此方法也适用：

df.any_datetime_series.value_counts().resample('D', how='sum')

python - 计算 Pandas 时间序列上的每日事件

3 回答 3

Related

Reference