2

我有一个np.datetime64Python 中的日期列表:

['2016-12-01T02:00:00.000000000', '2016-12-01T04:00:00.000000000',
 '2016-12-01T06:00:00.000000000', '2016-12-01T08:00:00.000000000',
 '2016-12-01T10:00:00.000000000', '2016-12-01T12:00:00.000000000', 
 '2016-12-01T14:00:00.000000000', '2016-12-01T16:00:00.000000000', 
 '2016-12-01T18:00:00.000000000', '2016-12-01T20:00:00.000000000', 
 '2016-12-01T22:00:00.000000000', '2016-12-02T00:00:00.000000000', 
 '2016-12-02T02:00:00.000000000', '2016-12-02T04:00:00.000000000', 
 '2016-12-02T06:00:00.000000000', '2016-12-02T08:00:00.000000000', 
 '2016-12-02T10:00:00.000000000', '2016-12-02T12:00:00.000000000', 
 '2016-12-02T14:00:00.000000000', '2016-12-02T16:00:00.000000000', 
 '2016-12-02T18:00:00.000000000', '2016-12-02T20:00:00.000000000', 
 '2016-12-02T22:00:00.000000000', '2016-12-03T00:00:00.000000000', 
 '2016-12-03T02:00:00.000000000', '2016-12-03T04:00:00.000000000',
 '2016-12-03T06:00:00.000000000', '2016-12-03T08:00:00.000000000', 
 '2016-12-03T10:00:00.000000000', '2016-12-03T12:00:00.000000000', 
 '2016-12-03T14:00:00.000000000', '2016-12-03T16:00:00.000000000', 
 '2016-12-03T18:00:00.000000000', '2016-12-03T20:00:00.000000000', 
 '2016-12-03T22:00:00.000000000']

我希望遍历列表中的每个日历日。我试图从列表中提取每个唯一日期(即找到最小和最大日期并在它们之间创建一个日期列表),但这对于我想要做的事情并不理想。

我想要的结果是拥有允许我遍历列表中找到的每个日期/日历日并获取与该日期相对应的日期时间的代码

for each_date in date_list:
    ***get all datetimes corresponding to each_date***

(loop would occur 3 times in this example)

笔记:

1) 迭代每个 [n:n+24] 的解决方案或任何不起作用的解决方案,因为不是每天都将具有相同数量的时间步长。

4

2 回答 2

3

如果时间戳是有序的,我们可以使用该itertools.groupby函数将数组的元素按相应的日期分组。

日期可以用 获得np.datetime64.astype(..., dtype='datetime64[D]'),所以我们可以这样写:

from numpy import datetime64
from functools import partial
from itertools import groupby

for day, timestamps in groupby(data_array,
                               partial(datetime64.astype, dtype='datetime64[D]')):
    # process day and timestamps
    pass

day是一个datetime64[D]numpy 对象(它只包含日期),并且timestamps是相应时间戳的可迭代(不是列表,但我们可以将其转换为列表)。data_array是包含初始数据的数组。

例如:

>>> for day, timestamps in groupby(data_array,
...                                partial(datetime64.astype, dtype='datetime64[D]')):
...     print((day, list(timestamps)))
... 
(numpy.datetime64('2016-12-01'), [numpy.datetime64('2016-12-01T02:00:00.000000000'), numpy.datetime64('2016-12-01T04:00:00.000000000'), numpy.datetime64('2016-12-01T06:00:00.000000000'), numpy.datetime64('2016-12-01T08:00:00.000000000'), numpy.datetime64('2016-12-01T10:00:00.000000000'), numpy.datetime64('2016-12-01T12:00:00.000000000'), numpy.datetime64('2016-12-01T14:00:00.000000000'), numpy.datetime64('2016-12-01T16:00:00.000000000'), numpy.datetime64('2016-12-01T18:00:00.000000000'), numpy.datetime64('2016-12-01T20:00:00.000000000'), numpy.datetime64('2016-12-01T22:00:00.000000000')])
(numpy.datetime64('2016-12-02'), [numpy.datetime64('2016-12-02T00:00:00.000000000'), numpy.datetime64('2016-12-02T02:00:00.000000000'), numpy.datetime64('2016-12-02T04:00:00.000000000'), numpy.datetime64('2016-12-02T06:00:00.000000000'), numpy.datetime64('2016-12-02T08:00:00.000000000'), numpy.datetime64('2016-12-02T10:00:00.000000000'), numpy.datetime64('2016-12-02T12:00:00.000000000'), numpy.datetime64('2016-12-02T14:00:00.000000000'), numpy.datetime64('2016-12-02T16:00:00.000000000'), numpy.datetime64('2016-12-02T18:00:00.000000000'), numpy.datetime64('2016-12-02T20:00:00.000000000'), numpy.datetime64('2016-12-02T22:00:00.000000000')])
(numpy.datetime64('2016-12-03'), [numpy.datetime64('2016-12-03T00:00:00.000000000'), numpy.datetime64('2016-12-03T02:00:00.000000000'), numpy.datetime64('2016-12-03T04:00:00.000000000'), numpy.datetime64('2016-12-03T06:00:00.000000000'), numpy.datetime64('2016-12-03T08:00:00.000000000'), numpy.datetime64('2016-12-03T10:00:00.000000000'), numpy.datetime64('2016-12-03T12:00:00.000000000'), numpy.datetime64('2016-12-03T14:00:00.000000000'), numpy.datetime64('2016-12-03T16:00:00.000000000'), numpy.datetime64('2016-12-03T18:00:00.000000000'), numpy.datetime64('2016-12-03T20:00:00.000000000'), numpy.datetime64('2016-12-03T22:00:00.000000000')])

所以在这里,我们每天都选择打印对应的列表timestamps,但这当然选项之一。如示例所示,并非所有切片都具有相同的长度(最后两个具有额外的元素)

请注意,timestamps是一个迭代器,因此会耗尽,如果您不将其转换为列表,那么在一个循环之后,迭代器就会耗尽

工作在线性时间,groupby因为每次它检查“组键”是否与前一个元素相同,但如前所述,约束是必须对数据进行排序。

于 2018-06-15T08:40:52.093 回答
1

您可以使用collections.defaultdictO(n) 解决方案。您可以使用 Pandas 来规范化您的datetime对象,尽管这也应该可以通过 NumPy 实现。

import pandas as pd
from collections import defaultdict

d = defaultdict(list)

for item in L:
    day = pd.to_datetime(item).normalize().to_datetime64()
    d[day].append(item)

print(d)

defaultdict(list,
            {numpy.datetime64('2016-12-01T00:00:00.000000000'):
                 [numpy.datetime64('2016-12-01T02:00:00.000000000'),
                  ...
                  numpy.datetime64('2016-12-01T22:00:00.000000000')],
             numpy.datetime64('2016-12-02T00:00:00.000000000'):
                 [numpy.datetime64('2016-12-02T00:00:00.000000000'),
                  ...
                  numpy.datetime64('2016-12-02T22:00:00.000000000')],
             numpy.datetime64('2016-12-03T00:00:00.000000000'):
                 [numpy.datetime64('2016-12-03T00:00:00.000000000'),
                  ...
                  numpy.datetime64('2016-12-03T22:00:00.000000000')]})
于 2018-06-15T08:48:02.840 回答