python - Python：如何查找具有相同 ID 的读数序列并计算每个此类序列的第一个元素和最后一个元素之间的间隔

Question

我正在使用 GPS 数据从赛道中发现对个人有意义的地方。一旦您对数据进行了聚类并将每个点分配给一个集群，您将获得输出文件，其中在其他列中存在用于时间戳和集群 ID 的列。要确定一个人每次访问它时在每个集群中停留了多长时间，您必须按时间戳对数据进行排序，并找到所有来自同一集群的读数序列。假设我有一个 id 模式 1,1,1,2,3,4,4,1,1,2,1,3,3,4,4,1,1,1,1,1 并且已经排序按时间戳 - 在这里您可以看到一个人四次访问集群 1。我想知道的是如何计算该人每次访问时在集群 1 中停留的时间。

数据示例（时间是以秒为单位的纪元时间）：
时间 | 集群
1377997076 | 1
1378000582 | 1
1378000596 | 1
1378031297 | 2
1378031302 | 2
1378031303 | 1
1378031345 | 1
1378033452 | 2
1378034222 | 2

这也可以表示为二维列表： mylist=[[1377997076,1],[1378000582,1],[1378000596,1],[1378031297,2],[1378031302,2],[1378031303,1],[1378031345 ,1],[1378033452,2],[1378034222,1]]

score 0 · Accepted Answer

这是开始的一些代码：

def chunk_sequences(it, n):
    """
    Yield all sequences of n from iterable.
    """
    chunk = []
    for x in it:
        if x == n:
            chunk.append(n)
        else:
            if len(chunk) > 0:
                yield chunk
                chunk = []
    if len(chunk) > 0:
        #needed in case the last sequence runs into the last element
        yield chunk

快速而肮脏，如果性能至关重要，您可能希望转向基于 itertools 的解决方案（可能涉及takewhile.

因此，通过上述方式，您可以执行以下操作：

list(chunk_sequences(pattern,1))
Out[59]: [[1, 1, 1], [1, 1], [1], [1, 1, 1, 1, 1]]

这很容易变成：

[len(x) for x in list(chunk_sequences(pattern,1))]
Out[60]: [3, 2, 1, 5]

..这是每个各自停留在集群 1 中的长度。

python - Python：如何查找具有相同 ID 的读数序列并计算每个此类序列的第一个元素和最后一个元素之间的间隔

1 回答 1

Related

Reference