我编写了以下简单的解析器(以在我稍微复杂的程序中展示问题),它从DBLP xml 数据库中的所有条目中提取标题。
from lxml import etree
class DBLPTarget(object):
def __init__(self, outfile):
self.inField = False
self.outfile = outfile
def start(self, tag, attrib):
if tag == 'title':
self.inField = True
def end(self, tag):
if self.inField and tag == 'title':
self.inField = False
def data(self, data):
if self.inField:
self.outfile.write('%s\n' % data)
def close(self):
pass
outfile = open('dblp-selected.txt', 'w')
parser = etree.XMLParser(target = DBLPTarget(outfile), load_dtd=True)
infile = 'dblp.xml'
results = etree.parse(infile, parser)
outfile.close()
print("Done.")
在 dblp.xml 文件上运行此代码后,它会运行一段时间(生成大约 72K 的内容),然后引发以下错误消息。
Traceback (most recent call last):
File "C:/Users/je24621/Desktop/dblp-example2.py", line 30, in <module>
results = etree.parse(infile, parser)
File "lxml.etree.pyx", line 3197, in lxml.etree.parse (src\lxml\lxml.etree.c:65042)
File "parser.pxi", line 1571, in lxml.etree._parseDocument (src\lxml\lxml.etree.c:93101)
File "parser.pxi", line 1600, in lxml.etree._parseDocumentFromURL (src\lxml\lxml.etree.c:93388)
File "parser.pxi", line 1500, in lxml.etree._parseDocFromFile (src\lxml\lxml.etree.c:92445)
File "parser.pxi", line 1047, in lxml.etree._BaseParser._parseDocFromFile (src\lxml\lxml.etree.c:89329)
File "parsertarget.pxi", line 160, in lxml.etree._TargetParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:100233)
File "parsertarget.pxi", line 154, in lxml.etree._TargetParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:100143)
File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:9383)
TypeError: function takes exactly 5 arguments (1 given)
作为参考,我在 Windows 7(不是选择)上使用 Python 3.2.5 和 lxml 3.2.1 执行此操作。我该如何修复/调试这个?