2

如何在 Python 中遍历 defaultdict(list)?有没有更好的方法在 Python 中拥有一个列表字典?我已经尝试了正常iter(dict)但我得到了错误:

>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "para.py", line 31, in print_doc
    for para in iter(doc):
TypeError: iteration over non-sequence

主要类:

import para
para.print_doc('./foo/bar/para-lines.txt')

para.pyc:

# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')
    # Separator here refers to the paragraph seperator,
    #  the default separator is '\n'.
    def __init__(self, filename, separator=None):
        # Set separator if passed into object's parameter,
        #  else set default separator as '\n'
        if separator is None:
            def separator(line): return line == '\n'
        elif not callable(separator):
            raise TypeError, "separator argument must be callable"
        self.separator = separator
        # Reading lines from files into a dictionary of lists
        self.doc = defaultdict(list)
        paraIndex = 0
        with open(filename) as readFile:
            for line in readFile:
                if line == separator:
                    paraIndex+=1
                else:
                    self.doc[paraIndex].append(line)

# Prints out populated doc from txtfile
def print_doc(filename):
    text = Paragraphs(filename)
    for para in iter(text.doc):
        for sent in text.doc[para]:
            print "Para#%d, Sent#%d: %s" % (
                para, text.doc[para].index(sent), sent)

一个./foo/bar/para-lines.txt看起来像这样的例子:

This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.

This is the start of next para.
foo boo bar bar
this is the end.

主类的输出应如下所示:

Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.

Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.
4

5 回答 5

4

你遇到的问题

for para in iter(doc):

doc是 Paragraph 的一个实例,而不是defaultdict. 您在方法中使用的默认字典__init__超出范围并丢失。所以你需要做两件事:

  1. doc将方法中创建的保存__init__为实例变量(self.doc例如)。

  2. 要么使Paragraphs自己可迭代(通过添加__iter__方法),要么允许它访问创建的doc对象。

于 2011-12-27T16:06:11.397 回答
2

您链接到的食谱相当陈旧。它是在 2001 年编写的,当时 Python 没有更现代的工具,如itertools.groupby(在 Python2.4 中引入,于 2003 年底发布)。这是您的代码使用的样子groupby

import itertools
import sys

with open('para-lines.txt', 'r') as f:
    paranum = 0
    for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
        if is_separator:
            # we've reached paragraph separator
            print
        else:
            paranum += 1
            for n, sentence in enumerate(paragraph, start = 1):
                sys.stdout.write(
                    'Para#{i:d},Sent#{n:d}: {s}'.format(
                        i = paranum, n = n, s = sentence))
于 2011-12-27T16:25:12.547 回答
0

问题似乎是您正在迭代您的Paragraphs课程,而不是字典。此外,不要遍历键然后访问字典条目,而是考虑使用

for (key, value) in d.items():
于 2011-12-27T16:02:52.747 回答
0

它失败了,因为您没有__iter__()在 Paragraphs 类中定义然后尝试调用iter(doc)(其中 doc 是 Paragraphs 实例)。

为了可迭代,一个类必须具有__iter__()返回迭代器。文档在这里

于 2011-12-27T16:04:14.703 回答
0

我想不出你在这里使用字典的任何原因,更不用说默认字典了。列表列表会简单得多。

doc = []
with open(filename) as readFile:
    para = []
    for line in readFile:
        if line == separator:
            doc.append(para)
            para = []
        else:
            para.append(line)
    doc.append(para)
于 2011-12-27T16:09:42.000 回答