0

我正在尝试解析一个类名如 class="link" 的 html,我的问题是想要读取变量中的每一行然后解析它,但它应该与三引号一起使用,我怎样才能用三重引号创建一个字符串变量报价风格。谢谢。

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print "Encountered a start tag:", tag
    def handle_endtag(self, tag):
        print "Encountered an end tag :", tag
    def handle_data(self, data):
        print "Encountered some data  :", data

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()

var = open('./index.html','r')
strings = var.read()



parser.feed('<html><head><title>Test</title></head>'
        '<body><h1>Parse me!</h1></body></html>')

好吧,如果我从本地文件中读取内容,我该如何解析字符串 var?

索引.html:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
    <title>Document</title>
</head>
<body>
    <div class="row">
        <h1>hello world</h1>
            <div class="row">
                <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Id, excepturi, consequatur sed nobis facere veritatis tempore qui ipsum enim dignissimos!</p>
            </div>
    </div>
</body>
</html>

如果我将这个 html 作为一个大字符串读取,我该如何解析它,我只想获取 h1 标签中的内容。谢谢你的时间。

4

1 回答 1

0
   h1 = false

   class MyHTMLParser(HTMLParser):
       def handle_starttag(self, tag, attrs):
          ## print "Encountered a start tag:", tag
          if tag == 'h1':
                 h1 = true
       def handle_endtag(self, tag):
          ## print "Encountered an end tag :", tag
          if tag == 'h1':
                 h1 = false
       def handle_data(self, data):
           ## print "Encountered some data  :", data
           if h1:
                 print data
于 2013-08-16T02:06:45.213 回答