python - Python - 检查文件中的行顺序

Question

如何检查文件中的行顺序？

示例文件：

a b c d e f
b c d e f g
1 2 3 4 5 0

要求：

所有以 a 开头的行，必须在以 b 开头的行之前。
以 a 开头的行数没有限制。
以 a 开头的行可能存在也可能不存在。
包含整数的行必须遵循以 b 开头的行。
数字行必须至少有两个整数，后跟零。
不满足条件必须引发错误。

我最初认为是一个相当冗长的 for 循环，但失败了，因为我无法索引 line [0] 之外的行。另外，我不知道如何定义一条线相对于其他线的位置。这些文件的长度没有限制，因此内存也可能是一个问题。

任何建议都非常欢迎！欢迎这个困惑的新手简单易读！

谢谢，海鲜。

score 4 · Accepted Answer

一种简单的迭代方法。这定义了一个函数来确定从 1 到 3 的线型。然后我们遍历文件中的行。未知的线型或小于任何先前线型的线型将引发异常。

def linetype(line):
    if line.startswith("a"):
        return 1
    if line.startswith("b"):
        return 2
    try:
        parts = [int(x) for x in line.split()]
        if len(parts) >=3 and parts[-1] == 0:
            return 3
    except:
        pass
    raise Exception("Unknown Line Type")

maxtype = 0

for line in open("filename","r"):  #iterate over each line in the file
    line = line.strip() # strip any whitespace
    if line == "":      # if we're left with a blank line
        continue        # continue to the next iteration

    lt = linetype(line) # get the line type of the line
                        # or raise an exception if unknown type
    if lt >= maxtype:   # as long as our type is increasing
        maxtype = lt    # note the current type
    else:               # otherwise line type decreased
        raise Exception("Out of Order")  # so raise exception

print "Validates"  # if we made it here, we validated

score 2 · Accepted Answer

您可以将所有行放入一个列表中lines = open(thefile).readlines()，然后根据您的需要处理该列表 - 不是最有效但最简单的。

同样最简单的方法是执行多个循环，每个条件一个循环（除了 2，它不是可以违反的条件，而 5 不是真正的条件；-)。“所有以 a 开头的行，必须在以 b 开头的行之前”可以被认为是“以 a 开头的最后一行，如果有的话，必须在以 b 开头的第一行之前”，所以：

lastwitha = max((i for i, line in enumerate(lines)
                 if line.startswith('a')), -1)
firstwithb = next((i for i, line in enumerate(lines) 
                   if line.startswith('b')), len(lines))
if lastwitha > firstwithb: raise Error

然后类似地对于“包含整数的行”：

firstwithint = next((i for i, line in enumerate(lines)
                     if any(c in line for c in '0123456789')), len(lines))
if firstwithint < firstwithb: raise Error

这对你的作业真的应该有很多提示——你现在可以自己做剩下的最后一点，条件4吗？

当然，您可以采取与我在这里建议的不同的策略（next用于获取满足条件的行的第一个数字——这需要 Python 2.6，顺便说一句——并any满足all序列中的任何/所有项目是否满足一个条件），但我正在努力满足您的要求，以获得最大的简单性。如果您发现传统for循环比next,any和更简单all，请告诉我们，我们将展示如何将这些高级抽象形式的使用重新编码为那些较低层的概念！

score 0 · Accepted Answer

您不需要索引这些行。对于每一行，您都可以检查/设置一些条件。如果不满足某些条件，则引发错误。例如规则 1：您将变量 was_b 最初设置为 False。在每次迭代中（除了其他检查/集合），还要检查该行是否以“b”开头。如果是，则设置 was_b = True。另一项检查是：如果行以“a”开头并且 was_b 为真，则引发错误。另一个检查是：如果 line 包含整数并且 was_b 为 False，则引发错误..等

score 0 · Accepted Answer

线路限制：

I. 'a'在我们遇到以开头的行之后，必须没有以开头的行'b'。

II. 如果我们遇到一个数字行，那么前一个必须以开头'b'。（或者您的第 4 个条件允许另一种解释：每'b'行后面必须跟一个数字行）。

数字行的限制（作为正则表达式）：/\d+\s+\d+\s+0\s*$/

#!/usr/bin/env python
import re

is_numeric = lambda line: re.match(r'^\s*\d+(?:\s|\d)*$', line)
valid_numeric = lambda line: re.search(r'(?:\d+\s+){2}0\s*$', line)

def error(msg):
    raise SyntaxError('%s at %s:%s: "%s"' % (msg, filename, i+1, line))

seen_b, last_is_b = False, False
with open(filename) as f:
    for i, line in enumerate(f):
        if not seen_b:
           seen_b = line.startswith('b')

        if seen_b and line.startswith('a'):
           error('failed I.')
        if not last_is_b and is_numeric(line):
           error('failed II.')
        if is_numeric(line) and not valid_numeric(line):
           error('not a valid numeric line')

        last_is_b = line.startswith('b')

python - Python - 检查文件中的行顺序

4 回答 4

Related

Reference