python - 如何使用 python HTMLParser 提取 HTML 标签内容

Question

我正在处理 HTML 页面，最后得到这样的行：

<td class="border">AAA</td><td class="border">BBB</td>

我需要使用 HTMLParser 将 AAA 和 BBB 提取到变量中，但我无法弄清楚如何简单地做到这一点。我不能使用任何其他解析器，因为我在 python 工具方面受到限制。任何帮助，将不胜感激。

score 5 · Accepted Answer

这将打印 TD 标签内的数据：

from HTMLParser import HTMLParser

inTD = False

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        global inTD
        if tag.upper() == "TD":
            inTD = True
    def handle_endtag(self, tag):
        global inTD
        if tag.upper() == "TD":
            inTD = False
    def handle_data(self, data):
        global inTD
        if inTD:
            print data

python - 如何使用 python HTMLParser 提取 HTML 标签内容

1 回答 1

Related

Reference