python - python循环异常

Question

嘿，我是 python 的新手，我需要一些帮助。我写下了以下代码：

 try:
  it = iter(cmLines)
  line=it.next()
  while (line):
    if ("INFERNAL1/a" in line) or ("HMMER3/f" in line) :
      title = line
      line = it.next()
      if word2(line) in namesList: //if second word in line is in list
        output.write(title)
        output.write(line)
        line = it.next()
        while ("//" not in line):
          output.write(line)
          line = it.next()
        output.write(line)
    line = it.next()
except Exception as e:
  print "Loop exited becuase:"
  print type(e)
  print "at " + line
finally:
  output.close()

当循环结束时，它总是抛出一个异常，通知循环停止。即使它没有提前终止。我该如何阻止它？
有没有更好的方法来编写我的代码？更时尚的东西。我有一个包含大量信息的大文件，我试图只捕获我需要的信息。每条信息的格式如下：
```
Infernal1/a ...
Name someSpecificName
...
...
...
...
// 
```

谢谢

score 2 · Accepted Answer

RocketDonkey 的回答很到位。由于您迭代方式的复杂性，没有简单的方法可以使用for循环执行此操作，因此您需要显式处理StopIteration.

但是，如果您重新考虑一下这个问题，还有其他方法可以解决这个问题。例如，一个普通的状态机：

try:
    state = 0
    for line in cmLines:
        if state == 0:
            if "INFERNAL1/a" in line or "HMMER3/f" in line:
                title = line
                state = 1
        elif state == 1:
            if word2(line) in NamesList:
                output.write(title)
                output.write(line)
                state = 2
            else:
                state = 0
        elif state == 2:
            output.write(line)
            if '//' in line:
                state = 0
except Exception as e:
    print "Loop exited becuase:"
    print type(e)
    print "at " + line
finally:
    output.close()

或者，您可以编写一个生成器函数，委托给子生成器（yield from foo()如果您在 3.3 中，则通过，for x in foo(): yield x如果不是，则通过）或各种其他可能性，特别是如果您在更高级别重新考虑您的问题。

这可能不是你想在这里做的，但通常至少值得考虑“我可以把这个while循环和两个显式next调用变成一个for循环吗？”，即使答案是“不，不是不做事可读性较差。”

作为旁注，您可以通过将try/替换finally为with语句来简化事情。而不是这个：

output = open('foo', 'w')
try:
    blah blah
finally:
    output.close()

你可以这样做：

with open('foo', 'w') as output:
    blah blah

或者，如果output不是普通文件，您仍然可以将最后四行替换为：

with contextlib.closing(output):
    blah blah

score 1 · Accepted Answer

当你打电话line = it.next()时，当没有任何东西时StopIteration会引发异常：

>>> l = [1, 2, 3]
>>> i = iter(l)
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
  File "<ipython-input-6-e590fe0d22f8>", line 1, in <module>
    i.next()
StopIteration

每次都会在您的代码中发生这种情况，因为您是在块的末尾调用它，因此在循环有机会返回并发现它line是空的之前引发异常。作为创可贴修复，您可以执行以下操作，在其中捕获StopIteration异常并将其传递出去（因为这表明它已完成）：

# Your code...
except StopIteration:
    pass
except Exception as e:
  print "Loop exited becuase:"
  print type(e)
  print "at " + line
finally:
  output.close()

score 0 · Accepted Answer

我喜欢Parser Combinators，因为它们导致了更具声明性的编程风格。

以Parcon库为例：

from string import letters, digits
from parcon import (Word, Except, Exact, OneOrMore,
                    CharNotIn, Literal, End, concat)

alphanum = letters + digits

UntilNewline = Exact(OneOrMore(CharNotIn('\n')) + '\n')[concat]
Heading1 = Word(alphanum + '/')
Heading2 = Word(alphanum + '.')
Name = 'Name' + UntilNewline
Line = Except(UntilNewline, Literal('//'))
Lines = OneOrMore(Line)
Block = Heading1['hleft'] + Heading2['hright'] + Name['name'] + Lines['lines'] + '//'
Blocks = OneOrMore(Block[dict]) + End()

然后，使用Alex Martelli 的Bunch课程：

class Bunch(object):
    def __init__(self, **kwds):
        self.__dict__.update(kwds)

names = 'John', 'Jane'
for block in Blocks.parse_string(config):
    b = Bunch(**block)
    if b.name in names and b.hleft.upper() in ("INFERNAL1/A', 'HMMER3/F"):
        print ' '.join((b.hleft, b.hright))
        print 'Name', b.name
        print '\n'.join(b.lines)

鉴于此文件：

Infernal1/a ...
Name John
...
...
...
...
//
SomeHeader/a ...
Name Jane
...
...
...
...
//
HMMER3/f ...
Name Jane
...
...
...
...
//
Infernal1/a ...
Name Billy Bob
...
...
...
...
//

结果是：

Infernal1/a ...
Name John
...
...
...
...
HMMER3/f ...
Name Jane
...
...
...
...

score 0 · Accepted Answer

1/ 无异常处理

为避免处理异常StopIteration，您应该查看处理序列的 Pythonic 方式（正如 Abarnert 提到的那样）：

it = iter(cmLines)
for line in it:
    # do

2/ 捕捉信息

此外，您可以尝试使用正则表达式来捕捉您的信息模式。您确实知道第一行的确切表达式。然后你想捕捉名字并将其与一些可接受的名字列表进行比较。最后，您正在寻找 next //。您可以构建一个包含换行符的正则表达式，并使用一个组来捕获您要检查的名称，

(...)

匹配括号内的任何正则表达式，并指示组的开始和结束；组的内容可以在执行匹配后检索，并且可以稍后在字符串中使用 \number 特殊序列进行匹配，如下所述。要匹配文字 '(' 或 ')'，请使用 ( 或 )，或将它们包含在字符类中：[(] [)]。

这是一个关于 Python doc 中正则表达式使用组的示例

>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
>>> m.group(0)       # The entire match
'Isaac Newton'
>>> m.group(1)       # The first parenthesized subgroup.
'Isaac'
>>> m.group(2)       # The second parenthesized subgroup.
'Newton'
>>> m.group(1, 2)    # Multiple arguments give us a tuple.
('Isaac', 'Newton')

更多关于正则表达式。

关联

迭代器 next() 在 Python 中引发异常：https ://softwareengineering.stackexchange.com/questions/112463/why-do-iterators-in-python-raise-an-exception

score 0 · Accepted Answer

StopIteration您可以明确忽略：

 try:
     # parse file
     it = iter(cmLines)
     for line in it:
         # here `line = next(it)` might raise StopIteration
 except StopIteration:
     pass
 except Exception as e:
     # handle exception

或者打电话line = next(it, None)查一下None。

为了分离关注点，您可以将代码分成两部分：

将输入拆分为记录：

from collections import deque
from itertools import chain, dropwhile, takewhile

def getrecords(lines):
    it = iter(lines)
    headers = "INFERNAL1/a", "HMMER3/f"
    while True:
        it = chain([next(it)], it) # force StopIteration at the end
        it = dropwhile(lambda line: not line.startswith(headers), it)
        record = takewhile(lambda line: not line.starswith("//"), it)
        yield record
        consume(record) # make sure each record is read to the end

def consume(iterable):
    deque(iterable, maxlen=0)

输出您感兴趣的记录：

from contextlib import closing

with closing(output):
    for record in getrecords(cmLines):
        title, line = next(record, ""), next(record, "")
        if word2(line) in namesList:
           for line in chain([title, line], record):
               output.write(line)

python - python循环异常

5 回答 5

Related

Reference