-1

当我使用以下代码时。

from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup (open("43rd-congress.htm"))

final_link = soup.p.a
final_link.decompose()


f = csv.writer(open("43rd_congress_all.csv", "w"))
f.writerow(["Name","Years","Position","Party", "State", "Congress", "Link"])
trs = soup.find_all('tr')

for tr in trs:
    for link in tr.find_all('a'):
        fulllink = link.get ('href')

        print fulllink #print in terminal to verify results

        tds = tr.find_all("td")

        try: #we are using "try" because the table is not well formatted. This allows the program to continue after encountering an error.
            names = str(tds[0].get_text()) # This structure isolate the item by its column in the table and converts it into a string.
            years = str(tds[1].get_text())
            positions = str(tds[2].get_text())
            parties = str(tds[3].get_text())
            states = str(tds[4].get_text())
            congress = tds[5].get_text()

        except:
            print "bad tr string"
            continue #This tells the computer to move on to the next item after it encounters an error

        print names, years, positions, parties, states, congress
        f.writerow([names, years, posiitons, parties, states, congress, fullLink])

我得到一个名称错误。但是,当我尝试纠正错误时,我在最后一行代码中收到一个错误,指出变量未定义。我已经进行了更正,以使其与社区保持一致。我如何解决它?

我感谢您的帮助。

我在记事本++和powershell中运行它。我在本教程的最后一部分... http://jeriwieringa.com/blog/2012/11/04/beautiful-soup-tutorial-part-1/

4

2 回答 2

2

names, years, posiitons, parties, states, congresstry/except如果子句中的第一行引发错误,则永远不会创建。

发生的事情是在try结构期间引发了错误。假设names = str(tds[0].get_text())创建了一个错误。你抓住了它,但后面的变量永远不会被创建。

您可能需要考虑在您的try/except,例如之前设置默认值names = ''


您的缩进错误可能只是因为混合了制表符和空格,因为您的代码对我来说看起来不错。

于 2013-10-21T05:11:46.880 回答
0
    #                       |-> Different from when passed below
    print names, years, positions, parties, states, congress
    f.writerow([names, years, posiitons, parties, states, congress, fullLink])
    #                             |-> Different from original name    |-> Same with fullLink, its supposed to be called fullink when instantiated.

在上面的例子中,positionsposiitons不一样。这是一个简单的打字错误。

看看下面的代码,看看它是否运行,因为我没有你的文件。

from bs4 import BeautifulSoup
import csv

soup = BeautifulSoup(open("43rd-congress.htm"))

final_link = soup.p.a
final_link.decompose()

f = csv.writer(open("43rd_congress_all.csv", "w"))
f.writerow(["Name", "Years", "Position", "Party", "State", "Congress", "Link"])
trs = soup.find_all('tr')

for tr in trs:
    for link in tr.find_all('a'):
        fullLink = link.get('href')

        print fullLink  # print in terminal to verify results

        tds = tr.find_all("td")

        try:  # we are using "try" because the table is not well formatted. This allows the program to continue after
              # encountering an error.
            # This structure isolate the item by its column in the table and converts it into a string
            names = str(tds[0].get_text())
            years = str(tds[1].get_text())
            positions = str(tds[2].get_text())
            parties = str(tds[3].get_text())
            states = str(tds[4].get_text())
            congress = tds[5].get_text()

            print names, years, positions, parties, states, congress
            f.writerow([names, years, positions, parties, states, congress, fullLink])
        except IndexError:
            print "bad tr string"
            continue  # This tells the computer to move on to the next item after it encounters an error
于 2013-10-21T08:59:55.280 回答