0

假设您有一个包含 n 个时间戳(Pythondatetime对象)的排序列表。你将如何产生一个形式的元组列表,一个对象(t, count)在哪里,列表中的元素数最多是多少分钟?tdatetimecountxt

例如,给定日期(字符串,为简洁起见;实际上是datetime对象):

timestamps = ["13:00", "13:01", "13:03", "13:04", "13:05", "13:06", "13:09"]

如果x是两分钟,则屈服

[("13:00", 2), ("13:03":3), ("13:06":1), ("13:09", 1)]]

我想要做的是在资源上制作一个更粗略的点击列表,我拥有的唯一数据是每次点击的访问时间(粒度化到毫秒,我希望它精确到分钟,或十分钟)

我会发布我的尝试,但我很惭愧......

编辑:这是我到目前为止所拥有的......测试它是否有效......

def group_timestamps(timestamps, chunksize=10):
    """Groups a list of timestamps in chunks of ``chunksize`` minutes"""
    cs = timedelta(minutes=chunksize)

    if not timestamps:
        return []

    t0 = timestamps[0]
    count = 1
    chunks = []

    for ts in timestamps:
        if (ts - t0) <= cs:
            count += 1
        else:
            chunks.append((t0, count))
            t0 = ts
            count = 1
    return chunks
4

3 回答 3

3

这应该有效:

current = timestamps[0]
count = 0
res = []
for t in timestamps:
    if (t - current) <=  timedelta(minutes= 2): 
         count = count + 1
    else:
         res.append((current,count))
         current = t
         count = 1
res.append(current,count) #add last tuple

按照你的例子:

timestamps = [datetime(hours=13,minutes=00), datetime(hours=13,minutes=01), datetime(hours=13,minutes=03), datetime(hours=13,minutes=04), datetime(hours=13,minutes=05), datetime(hours=13,minutes=06), datetime(hours=13,minutes=09)]

res = [(datetime(hours=13,minutes=00),2),(datetime(hours=13,minutes=03),3),(datetime(hours=13,minutes=06),1),(datetime(hours=13,minutes=09),1)]
于 2012-06-04T02:47:10.590 回答
1

这是我的解决方案版本:

from datetime import datetime

# SAMPLE TIMESTAMP DATA
timestamps = []
timestamps.append(datetime.utcfromtimestamp(1338777480))
timestamps.append(datetime.utcfromtimestamp(1338777580))
timestamps.append(datetime.utcfromtimestamp(1338777610))
timestamps.append(datetime.utcfromtimestamp(1338777680))
timestamps.append(datetime.utcfromtimestamp(1338777780))
timestamps.append(datetime.utcfromtimestamp(1338777980))
timestamps.append(datetime.utcfromtimestamp(1338778180))
timestamps.append(datetime.utcfromtimestamp(1338778230))
timestamps.append(datetime.utcfromtimestamp(1338778480))

MIN_THRSH = 2  # Range in minutes within to chunk data.

def chunk_time(timestamp_list):
    chunk_list = []
    current_chunk_idx = None
    for i, dt in enumerate(timestamp_list):
        if (i == 0 or
            ((dt - timestamp_list[current_chunk_idx]).seconds / 60) > MIN_THRSH):
            chunk_list.append([dt.strftime('%H:%M'), 1])
            current_chunk_idx = i
        else:
            chunk_list[-1][1] += 1
    return chunk_list

if __name__ == "__main__":
    for t in timestamps:
        print t.strftime('%H:%M')
    print chunk_time(timestamps)

输出:

02:38
02:39
02:40
02:41
02:43
02:46
02:49
02:50
02:54
[['02:38', 3], ['02:41', 2], ['02:46', 1], ['02:49', 2], ['02:54', 1]]
于 2012-06-04T02:58:48.487 回答
0

如果您只需要计数,您可以简单地在 unix 时间戳上使用直方图。例如numpy.histogram

于 2016-04-13T09:04:37.323 回答