3

我在下面的代码中遇到了上述错误。错误发生在最后一行。请原谅主题,我只是在练习我的python技能。=)

from urllib.request import urlopen
from bs4 import BeautifulSoup
from pprint import pprint
from pickle import dump

moves = dict()
moves0 = set()
url = 'http://www.marriland.com/pokedex/1-bulbasaur'
print(url)
# Open url
with urlopen(url) as usock:
    # Get url data source
    data = usock.read().decode("latin-1")
    # Soupify
    soup = BeautifulSoup(data)
    # Find move tables
    for div_class1 in soup.find_all('div', {'class': 'listing-container listing-container-table'}):
        div_class2 = div_class1.find_all('div', {'class': 'listing-header'})
        if len(div_class2) > 1:
            header = div_class2[0].find_all(text=True)[1]
            # Take only moves from Level Up, TM / HM, and Tutor
            if header in ['Level Up', 'TM / HM', 'Tutor']:
                # Get rows
                for row in div_class1.find_all('tbody')[0].find_all('tr'):
                    # Get cells
                    cells = row.find_all('td')
                    # Get move name
                    move = cells[1].find_all(text=True)[0]
                    # If move is new
                    if not move in moves:
                        # Get type
                        typ = cells[2].find_all(text=True)[0]
                        # Get category
                        cat = cells[3].find_all(text=True)[0]
                        # Get power if not Status or Support
                        power = '--'
                        if cat != 'Status or Support':
                            try:
                                # not STAB
                                power = int(cells[4].find_all(text=True)[1].strip(' \t\r\n'))
                            except ValueError:
                                try:
                                    # STAB
                                    power = int(cells[4].find_all(text=True)[-2])
                                except ValueError:
                                    # Moves like Return, Frustration, etc.
                                    power = cells[4].find_all(text=True)[-2]
                        # Get accuracy
                        acc = cells[5].find_all(text=True)[0]
                        # Get pp
                        pp = cells[6].find_all(text=True)[0]
                        # Add move to dict
                        moves[move] = {'type': typ,
                                       'cat': cat,
                                       'power': power,
                                       'acc': acc,
                                       'pp': pp}
                    # Add move to pokemon's move set
                    moves0.add(move)

    pprint(moves)
    dump(moves, open('pkmn_moves.dump', 'wb'))

为了产生错误,我尽可能地减少了代码。故障可能很简单,但我不能随便找。同时,我通过将递归限制设置为 10000 进行了解决。

4

1 回答 1

10

只想为可能遇到此问题的其他人提供答案。具体来说,我在 Django 会话中从远程 API 缓存 BeautifulSoup 对象。

简短的回答是不支持酸洗 BeautifulSoup 节点。相反,我选择将原始字符串数据存储在我的对象中,并拥有一个动态解析它的访问器方法,以便只腌制原始字符串数据。

于 2013-06-05T14:46:56.293 回答