0

我正在学习 Python,并且很难理解 xml 解析器(ElementTree - XMLParser)的行为。

我修改了文档中的示例

class MaxDepth:                     # The target object of the parser
    path = ""
    def start(self, tag, attrib):   # Called for each opening tag.
        self.path += "/"+ tag
        print '>>> Entering - ' + self.path
    def end(self, tag):             # Called for each closing tag.
        print '<<< Leaving - ' + self.path
        if self.path.endswith('/'+tag):
            self.path = self.path[:-(len(tag)+1)]
    def data(self, data):
        if data:
            print '... data called ...'
            print data , 'length -' , len(data)
    def close(self):    # Called when all data has been parsed.
        return self

它打印以下输出

>>> Entering - /a
... data called ...

length - 1
... data called ...
   length - 2
>>> Entering - /a/b
... data called ...

length - 1
... data called ...
   length - 2
<<< Leaving - /a/b
... data called ...

length - 1
... data called ...
   length - 2
>>> Entering - /a/b
... data called ...

length - 1
... data called ...
     length - 4
>>> Entering - /a/b/c
... data called ...

length - 1
... data called ...
       length - 6
>>> Entering - /a/b/c/d
... data called ...

length - 1
... data called ...
       length - 6
<<< Leaving - /a/b/c/d
... data called ...

length - 1
... data called ...
     length - 4
<<< Leaving - /a/b/c
... data called ...

length - 1
... data called ...
   length - 2
<<< Leaving - /a/b
... data called ...

length - 1
<<< Leaving - /a
<__main__.MaxDepth instance at 0x10e7dd5a8>

我的问题是

  1. 何时调用 data() 方法。
  2. 为什么在开始标签之前调用了两次
  3. 我找不到 api 文档以获取有关data方法的更多详细信息。我在哪里可以找到类的 api 参考之XMLParser类的 javadoc。
4

1 回答 1

2

如果您要像这样修改数据方法:

def data(self, data):
    if data:
        print '... data called ...'
        print repr(data), 'length -' , len(data)

你会明白为什么要多次调用 data 方法;标签之间的每一行文本数据都会调用它:

>>> Entering - /a
... data called ...
'\n' length - 1
... data called ...
'  ' length - 2
>>> Entering - /a/b
... data called ...
'\n' length - 1
... data called ...
'  ' length - 2
<<< Leaving - /a/b
... data called ...
'\n' length - 1
... data called ...
'  ' length - 2
>>> Entering - /a/b
... data called ...
'\n' length - 1
... data called ...
'    ' length - 4
# ... etc ...

XMLParser 方法基于Expat解析器。

In my experience, any streaming XML parser will treat text data as a series of chunks and you have to concatenate any and all data events together until you hit the next starttag or endtag event. Often the parser breaks up chunks at whitespace boundaries but that is not a given.

于 2012-06-11T16:05:08.607 回答