python - 检查矩阵键是否按顺序运行 - python

Question

给定一个矩阵文件并且第一列用作python字典的键（称为docid），我应该如何读取文件以便如果键不在运行顺序中它会停止，即

if docid-1 > previous_docid或者
if docid < previd

我一直在按照下面的代码进行操作，但看起来有点冗长，还有其他方法可以产生相同的输出吗？（注意：解决方案需要处理最大 20 GB 的矩阵文件。为了代码片段，我给出了一个小数据集）

text = '''0 1 1
0 2 1
1 3 1
1 7 1
2 5 4
2 4 6
2 9 8
3 5 7
3 9 8
3 10 9
9 2 9
9 8 3
3 9 4'''

from collections import defaultdict
docs = defaultdict(list)
previd = -1
for line in text.split('\n'):
    docid, termid, val = map(int,line.split())
    if docid < previd or docid-1 > previd:
        print line
        break
    previd = docid
    docs[docid].append((termid,val))

for i in docs:
    print i, docs[i]

score 1 · Accepted Answer

我看不到任何简化，因为过滤条件取决于前一个元素（使潜在的过滤迭代变得复杂）。我认为您的代码并不复杂，但您可以定义一个特殊的遍历：

def read_text(text):
    for line in text.split('\n'):
        docid, termid, val = map(int,line.split())
        if docid < previd or docid-1 > previd:
            print line # I guess this is a debug feature
            return # or raise Exception("line not in running order", line)
        yield (docid, termid, val)

并在您的主要代码中：

for docid, termid, val in read_text(text):
    docs[docid].append((termid,val))

编辑：

而不是text.split('\n')可能open('myfile','r')更有效。

for line in open('myfile','r'):
    do_something(line)

python - 检查矩阵键是否按顺序运行 - python

1 回答 1

编辑：

Related

Reference