python - 如何在 Python 中遍历 defaultdict(list)？

Question

如何在 Python 中遍历 defaultdict(list)？有没有更好的方法在 Python 中拥有一个列表字典？我已经尝试了正常iter(dict)但我得到了错误：

>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "para.py", line 31, in print_doc
    for para in iter(doc):
TypeError: iteration over non-sequence

主要类：

import para
para.print_doc('./foo/bar/para-lines.txt')

para.pyc：

# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')
    # Separator here refers to the paragraph seperator,
    #  the default separator is '\n'.
    def __init__(self, filename, separator=None):
        # Set separator if passed into object's parameter,
        #  else set default separator as '\n'
        if separator is None:
            def separator(line): return line == '\n'
        elif not callable(separator):
            raise TypeError, "separator argument must be callable"
        self.separator = separator
        # Reading lines from files into a dictionary of lists
        self.doc = defaultdict(list)
        paraIndex = 0
        with open(filename) as readFile:
            for line in readFile:
                if line == separator:
                    paraIndex+=1
                else:
                    self.doc[paraIndex].append(line)

# Prints out populated doc from txtfile
def print_doc(filename):
    text = Paragraphs(filename)
    for para in iter(text.doc):
        for sent in text.doc[para]:
            print "Para#%d, Sent#%d: %s" % (
                para, text.doc[para].index(sent), sent)

一个./foo/bar/para-lines.txt看起来像这样的例子：

This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.

This is the start of next para.
foo boo bar bar
this is the end.

主类的输出应如下所示：

Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.

Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.

score 4 · Accepted Answer

你遇到的问题

for para in iter(doc):

那doc是 Paragraph 的一个实例，而不是defaultdict. 您在方法中使用的默认字典__init__超出范围并丢失。所以你需要做两件事：

doc将方法中创建的保存__init__为实例变量（self.doc例如）。
要么使Paragraphs自己可迭代（通过添加__iter__方法），要么允许它访问创建的doc对象。

score 2 · Accepted Answer

您链接到的食谱相当陈旧。它是在 2001 年编写的，当时 Python 没有更现代的工具，如itertools.groupby（在 Python2.4 中引入，于 2003 年底发布）。这是您的代码使用的样子groupby：

import itertools
import sys

with open('para-lines.txt', 'r') as f:
    paranum = 0
    for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
        if is_separator:
            # we've reached paragraph separator
            print
        else:
            paranum += 1
            for n, sentence in enumerate(paragraph, start = 1):
                sys.stdout.write(
                    'Para#{i:d},Sent#{n:d}: {s}'.format(
                        i = paranum, n = n, s = sentence))

score 0 · Accepted Answer

问题似乎是您正在迭代您的Paragraphs课程，而不是字典。此外，不要遍历键然后访问字典条目，而是考虑使用

for (key, value) in d.items():

score 0 · Accepted Answer

它失败了，因为您没有__iter__()在 Paragraphs 类中定义然后尝试调用iter(doc)（其中 doc 是 Paragraphs 实例）。

为了可迭代，一个类必须具有__iter__()返回迭代器。文档在这里。

score 0 · Accepted Answer

我想不出你在这里使用字典的任何原因，更不用说默认字典了。列表列表会简单得多。

doc = []
with open(filename) as readFile:
    para = []
    for line in readFile:
        if line == separator:
            doc.append(para)
            para = []
        else:
            para.append(line)
    doc.append(para)

python - 如何在 Python 中遍历 defaultdict(list)？

5 回答 5

Related

Reference