0

The code is from: Python if-statement based on content of HTML title tag

from HTMLParser import HTMLParser

def titleFinder(html):
    class MyHTMLParser(HTMLParser):
        def handle_starttag(self, tag, attrs):
            self.intitle = tag == "title"
        def handle_data(self, data):
            if self.intitle:
                self.title = data

    parser = MyHTMLParser()
    parser.feed(html)
    return parser.title

>>> print titleFinder('<html><head><title>Test</title></head>'
                '<body><h1>Parse me!</h1></body></html>')
Test

However, I got the following error message when the code below is run,

AttributeError: MyHTMLParser instance has no attribute 'intitle'

How can i fix the error message? Any ideas?

Code:

from HTMLParser import HTMLParser
import urllib2

def titleFinder(html):
    intitle = False
    class MyHTMLParser(HTMLParser):
        def handle_starttag(self, tag, attrs):
            self.intitle = tag == "title"
        def handle_data(self, data):
            if self.intitle:
                self.title = data

    parser = MyHTMLParser()
    parser.feed(html)
    return parser.title

response=urllib2.urlopen("https://stackoverflow.com/questions/13680074/attributeerror-xx-instance-has-no-attribute-intitle")
html= response.read()
print titleFinder(html)

The trackback is:

Traceback (most recent call last):
  File "D:\labs\test.py", line 19, in <module>
    print titleFinder(html)
  File "D:\labs\test.py", line 14, in titleFinder
    parser.feed(html)
  File "C:\Python27\lib\HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "C:\Python27\lib\HTMLParser.py", line 142, in goahead
    if i < j: self.handle_data(rawdata[i:j])
  File "D:\labs\test.py", line 10, in handle_data
    if self.intitle:
AttributeError: MyHTMLParser instance has no attribute 'intitle'

[UPDATE]

I finally solved the problem! Thank you, Martijn Pieters!

from HTMLParser import HTMLParser
import urllib2

def titleFinder(html):
    class MyHTMLParser(HTMLParser):
        def __init__(self):
            HTMLParser.__init__(self)
            self.title = ''
            self.intitle = False  #!!!
        def handle_starttag(self, tag, attrs):
            self.intitle = tag == "title"
        def handle_data(self, data):
            if self.intitle:
                self.title = self.title+data #!!!

    parser = MyHTMLParser()
    parser.feed(html)
    return parser.title

response=urllib2.urlopen("https://stackoverflow.com/questions/13680074/attributeerror-xx-instance-has-no-attribute-intitle")

html= response.read()
print titleFinder(html)
4

1 回答 1

1

您的handle_data方法在被调用之前handle_starttag被调用,并且当时没有intitle设置属性。

只需添加intitle = False到您的班级:

class MyHTMLParser(HTMLParser):
    intitle = False

    # your methods

handle_data为文档中的所有handle_starttag文本节点(包括空格)调用它,因此在之前调用它并不少见。

于 2012-12-03T09:11:49.643 回答