1

Okay, the Title is a bit vague, but what I'm trying to do is download data online, parse it and then put the parsed 'data' into an excel file.

I'm getting stuck in trying to put the data into a vector or list. Note that, the data can be either words or numbers. Also, I the length of the data is unknown. I tried the code below:

class MyHTMLParser(HTMLParser):
    def handle_data(self, data):
        d=[]
        d=d.append(data)

parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
            '<body><h1>Parse me!</h1></body></html>')

d

Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
d
NameError: name 'd' is not defined

I looked around the forum for an answer, but didn't seem to encounter anything. I am a beginner, so may I'm missing something basic? Thanks, for the help...

4

4 回答 4

4

Inside of class methods, you need to use self to reference a member variable.

Starting with something like this might make more sense:

class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.d = []
    def handle_data(self, data):
        self.d.append(data)

Then, to access d you would need to specify the class instance, so something like

parser.d

EDIT: global would work, but unless there is a compelling reason, I think you should learn how to do things the correct way rather than clutter the global namespace

于 2012-04-28T23:56:44.857 回答
2

There are three problems with your code.

  • You are creating a new empty list every time you call the method.
  • list.append returns None (not the list) so your list that now contains one element is not actually stored anywhere.
  • Assigning to a variable inside a function creates a local variable, not a global variable (unless you specify that the variable should be global with the global keyword).

Try this instead:

d = []

class MyHTMLParser(HTMLParser):
    def handle_data(self, data):
        d.append(data)

It's also a bad style to use global variable. You might want to consider making d an attribute of the class, and giving it a better name.

于 2012-04-28T23:55:21.837 回答
0

If you want to bind a name at module scope then you need to use global on it at the beginning of the function.

于 2012-04-28T23:55:08.823 回答
0

Is this what you are looking for?

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):
    data = []
    def get_data(self):
        return self.data
    def handle_starttag(self, tag, attrs):
        pass
    def handle_endtag(self, tag):
        pass
    def handle_data(self, data):
        self.data.append(data)


# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
            '<body><h1>Parse me!</h1></body></html>')
print 'All data', parser.get_data()

Output:

All data ['Test', 'Parse me!']
于 2012-04-29T00:00:25.983 回答