python - 在 Python 3.3 html.parser 中单步执行实体

Question

我有以下解析器：

class Parser(HTMLParser):

  def __init__(self):
    HTMLParser.__init__(self)
    self.tableCount = 0

  def handle_starttag(self, tag, attrs):
     if tag == "table":
       for attr in attrs:
         if attr[0] == "class" and attr[1] == "space":
           ## need to do some processing here

代替注释，我需要做的是在这一点之后步进所有 HTML 实体，直到table标记结束（此代码仅在tag == table如上所示时运行。

我该怎么做？我看不到任何方法可以逐步浏览此标签下的所有标签。请注意，我不能使用任何外部库，例如 BeautifulSoup（只是 Python 标准库）。

score 0 · Accepted Answer

class Parser(HTMLParser):

    def __init__(self):
        HTMLParser.__init__(self)
        self.inTable = False

    def handle_starttag(self, tag, attrs):
        if tag == "table" and ('class','space') in attrs:
            self.inTable = True
        if self.inTable:
            doSomething()

    def handle_endtag(self, tag):
        if tag == "table":
            self.inTable = False

我想xml.etree.ElementTree这种情况可能更容易使用。

python - 在 Python 3.3 html.parser 中单步执行实体

1 回答 1

Related

Reference