我正在尝试使用 pandas 的 groupby 函数在具有 DateTimeIndex 的 DataFrame 上构建我的数据组。使用 pd.TimeGrouper,我想按天分组。
当我定义这个 DataFrame 时,下面的操作n.groupby(pd.TimeGrouper("d"))
不起作用。
n = pd.DataFrame(
{"value": [5462,5462,3185]},
index=[pd.to_datetime("2013-10-13 19:03:54"),
pd.to_datetime("2013-10-12 19:03:54"),
pd.to_datetime("2013-10-11 13:19:23")])
错误:
n.groupby(pd.TimeGrouper("d"))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-248-120eaa65b064> in <module>()
----> 1 n.groupby(pd.TimeGrouper("d"))
\lib\site-packages\pandas\core\generic.pyc in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze)
184 return groupby(self, by, axis=axis, level=level, as_index=as_index,
185 sort=sort, group_keys=group_keys,
--> 186 squeeze=squeeze)
187
188 def asfreq(self, freq, method=None, how=None, normalize=False):
\lib\site-packages\pandas\core\groupby.pyc in groupby(obj, by, **kwds)
531 raise TypeError('invalid type: %s' % type(obj))
532
--> 533 return klass(obj, by, **kwds)
534
535
\lib\site-packages\pandas\core\groupby.pyc in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze)
195 if grouper is None:
196 grouper, exclusions = _get_grouper(obj, keys, axis=axis,
--> 197 level=level, sort=sort)
198
199 self.grouper = grouper
\lib\site-packages\pandas\core\groupby.pyc in _get_grouper(obj, key, axis, level, sort)
1268
1269 if isinstance(key, CustomGrouper):
-> 1270 gpr = key.get_grouper(obj)
1271 return gpr, []
1272 elif isinstance(key, Grouper):
\lib\site-packages\pandas\tseries\resample.pyc in get_grouper(self, obj)
106 def get_grouper(self, obj):
107 # Only return grouper
--> 108 return self._get_time_grouper(obj)[1]
109
110 def _get_time_grouper(self, obj):
\lib\site-packages\pandas\tseries\resample.pyc in _get_time_grouper(self, obj)
112
113 if self.kind is None or self.kind == 'timestamp':
--> 114 binner, bins, binlabels = self._get_time_bins(axis)
115 else:
116 binner, bins, binlabels = self._get_time_period_bins(axis)
\lib\site-packages\pandas\tseries\resample.pyc in _get_time_bins(self, axis)
146
147 # general version, knowing nothing about relative frequencies
--> 148 bins = lib.generate_bins_dt64(ax_values, bin_edges, self.closed)
149
150 if self.closed == 'right':
\lib\site-packages\pandas\lib.pyd in pandas.lib.generate_bins_dt64 (pandas\lib.c:16139)()
ValueError: Invalid length for values or for binner
令人惊讶的是,当我像下面这样定义 DataFrame 时,它工作得很好。请注意,我将最后一天更改为 2013-10-12 而不是 2013-10-11。
n = pd.DataFrame(
{"value": [5462,5462,3185]},
index=[pd.to_datetime("2013-10-13 19:03:54"),
pd.to_datetime("2013-10-13 19:03:54"),
pd.to_datetime("2013-10-12 13:19:23")])
在这种情况下,我得到一个正确的组对象:
n.groupby(pd.TimeGrouper("d"))
<pandas.core.groupby.DataFrameGroupBy object at 0x000000000A3D84E0>
我已经在源代码中查找了pandas的一些核心功能,但我不确定这是一个错误还是我只是不知道如何正确使用该功能。
另请注意,按月汇总也可以正常工作。
谢谢您的帮助。