0

我有这个带有时间戳的列表,我希望能够根据用户输入(小时 <= 24 或从午夜开始的天数或两者都没有)搜索特定时间跨度内的所有元素(每个时间跨度在另一个列表中都有相应的信息) .

示例(这只是一个示例列表,该解决方案应该适用于非常大的列表)

list =  ['2002-03-31 19:30:41', '2002-03-31 19:30:41', '2002-03-31 19:43:49', '2002-03-31     19:43:50', '2002-03-31 19:50:05', '2002-03-31 19:50:06', '2002-03-31 19:50:06', '2002-03-31 19:50:44', '2002-03-31 19:50:45', '2002-03-31 19:50:45', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:53:05', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:17', '2002-03-31 19:54:14', '2002-03-31 19:54:14']
activity= [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w]

我将使用最后一个元素list[-1]作为参考点。如果用户想查看过去三个小时内的活动,则意味着来自2002-03-31 16:54:14 to 2002-03-31 19:54:14时间戳位置的将用于从另一个列表中获取活动。我首先考虑将每个时间戳转换为可用的东西,以便更容易比较每个元素,但必须有一个更简单的解决方案。

这个模块看起来很有用,但我不知道如何使用它。

最好的祝福

4

4 回答 4

1

作为工作流程:

  • 使用该datetime模块通过以下方法将字符串转换为datetime对象strptime:您将获得datetime对象列表。
  • 通过从最后一项中减去此列表的每个条目来计算timedeltas
  • 您可以使用 a 的seconds属性timedelta来找出一个点与参考点之间的秒数:将其与3*3600(3h) 进行比较以找出哪些项目落在适当的时间段内。
于 2012-09-25T17:21:19.983 回答
1

您非常幸运,因为您的时间戳以最简单的排序顺序排列,您可以将整个“转换为时间值”放在首位,然后进行字符串比较:

times =  ['2002-03-31 19:30:41', '2002-03-31 19:30:41', '2002-03-31 19:43:49', '2002-03-31     19:43:50', '2002-03-31 19:50:05', '2002-03-31 19:50:06', '2002-03-31 19:50:06', '2002-03-31 19:50:44', '2002-03-31 19:50:45', '2002-03-31 19:50:45', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:53:05', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:17', '2002-03-31 19:54:14', '2002-03-31 19:54:14']
activity= ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w']

start = '2002-03-31 16:54:14'
end = '2002-03-31 19:54:14'

for time, activity in zip(times, activity):
    if time >= start and time <= end:
        print time, activity
于 2012-09-25T18:00:06.520 回答
0

像这样的东西应该工作

ls =  ['2002-03-31 19:30:41', '2002-03-31 19:30:41', '2002-03-31 19:43:49', '2002-03-31     19:43:50', '2002-03-31 19:50:05', '2002-03-31 19:50:06', '2002-03-31 19:50:06', '2002-03-31 19:50:44', '2002-03-31 19:50:45', '2002-03-31 19:50:45', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:53:05', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:17', '2002-03-31 19:54:14', '2002-03-31 19:54:14']

# target is one of the items in the list
target = datetime.strptime('2002-03-31 19:53:17', '%Y-%m-%d %H:%M:%S')
for l in ls:
    print datetime.strptime(l, '%Y-%m-%d %H:%M:%S') - target

印刷

-1 day, 23:37:24
-1 day, 23:37:24
-1 day, 23:50:32
-1 day, 23:50:33
-1 day, 23:56:48
-1 day, 23:56:49
-1 day, 23:56:49
-1 day, 23:57:27
-1 day, 23:57:28
-1 day, 23:57:28
-1 day, 23:58:33
-1 day, 23:58:33
-1 day, 23:58:33
-1 day, 23:59:08
-1 day, 23:59:08
-1 day, 23:59:08
-1 day, 23:59:48
-1 day, 23:59:49
-1 day, 23:59:49
-1 day, 23:59:49
0:00:00
0:00:57
0:00:57

datetime.strptime(l, '%Y-%m-%d %H:%M:%S') - target返回一个timedelta对象(docs 。您可以访问timedelta对象和属性days,并将它们与某个所需的时间跨度进行比较。例如,要从某个参考点获取不到一小时内发生的所有事件的所有索引:secondsmicroseconds

less_than_an_hour = []
for i,l in enumerate(ls):
    if (datetime.strptime(l, '%Y-%m-%d %H:%M:%S') - target).seconds < 3600:
        less_than_an_hour.append(i)
于 2012-09-25T17:23:32.883 回答
0

ID:

  • 将时间戳列表转换为datetime对象:

    times = [datetime.datetime.strptime(t, '%Y-%m-%d %H:%M:%S') for t in times]
    
  • 使用该bisect模块查找用户请求的开始时间。使用bisect是比使用线性搜索更快的方法,前提是您也将用户输入转换为datetime对象:

    start = datetime.datetime(2002, 3, 31, 19, 53, 17)
    startindex = bisect.bisect_left(times, start)
    
  • 使用itertools函数将两个列表合并为一个显示与您的范围匹配的条目:

    end = datetime.datetime(2002, 4, 1, 07, 53, 17)
    
    merged = itertools.izip(times, activity)
    fromstart = itertools.islice(merged, startindex)
    untilend = itertools.takewhile(lambda e: e[0] <= end, fromstart)
    

untilenditerable 现在在元组之间和元组之间生成条目,startend无需(time, activity)为复制的列表使用任何额外的内存。这让您可以有效地处理大量数据。

演示:

>>> import itertools
>>> import datetime
>>> import bisect
>>> times =  ['2002-03-31 19:30:41', '2002-03-31 19:30:41', '2002-03-31 19:43:49', '2002-03-31     19:43:50', '2002-03-31 19:50:05', '2002-03-31 19:50:06', '2002-03-31 19:50:06', '2002-03-31 19:50:44', '2002-03-31 19:50:45', '2002-03-31 19:50:45', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:51:50', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:52:25', '2002-03-31 19:53:05', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:06', '2002-03-31 19:53:17', '2002-03-31 19:54:14', '2002-03-31 19:54:14']
>>> activity= ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w']
>>> times = [datetime.datetime.strptime(t, '%Y-%m-%d %H:%M:%S') for t in times]
>>> start = datetime.datetime(2002, 3, 31, 19, 53, 17)
>>> end = datetime.datetime(2002, 4, 1, 07, 53, 17)
>>> startindex = bisect.bisect_left(times, start)
>>> merged = itertools.izip(times, activity)
>>> fromstart = itertools.islice(merged, startindex)
>>> untilend = itertools.takewhile(lambda e: e[0] <= end, fromstart)
>>> for time, activity in untilend:
...     print time, activity
... 
2002-03-31 19:30:41 a
2002-03-31 19:30:41 b
2002-03-31 19:43:49 c
2002-03-31 19:43:50 d
2002-03-31 19:50:05 e
2002-03-31 19:50:06 f
2002-03-31 19:50:06 g
2002-03-31 19:50:44 h
2002-03-31 19:50:45 i
2002-03-31 19:50:45 j
2002-03-31 19:51:50 k
2002-03-31 19:51:50 l
2002-03-31 19:51:50 m
2002-03-31 19:52:25 n
2002-03-31 19:52:25 o
2002-03-31 19:52:25 p
2002-03-31 19:53:05 q
2002-03-31 19:53:06 r
2002-03-31 19:53:06 s
2002-03-31 19:53:06 t
于 2012-09-25T19:54:56.433 回答