0

我正在使用 BeautifulSoup 并且我不断收到错误 continue not proper in loop 。所以我删除了继续,然后我的打印语句出现无效的语法错误。我正在运行 BS4 和 Python 2.7.5,非常感谢所有帮助。这是我的代码。

from bs4 import BeautifulSoup

soup = BeautifulSoup (open("43rd-congress.html"))

final_link = soup.p.a
final_link.decompose()

trs = soup.find_all('tr')

for tr in trs:
for link in tr.find_all('a'):
    fulllink = link.get('href')
    print fulllink #print in terminal to verify results

tds = tr.find_all("td")


try: #we are using "try" because the table is not well formatted. 
   names = str(tds[0].get_text()) 
   years = str(tds[1].get_text())
   positions = str(tds[2].get_text())
   parties = str(tds[3].get_text())
   states = str(tds[4].get_text())
   congress = tds[5].get_text()

except:
  print "bad tr string"
  continue 

print names, years, positions, parties, states, congress
4

2 回答 2

1

由于您似乎有错误,我相信您的文件中可能确实有错误的缩进。您的代码应该如下所示:

from bs4 import BeautifulSoup

soup = BeautifulSoup (open("43rd-congress.html"))

final_link = soup.p.a
final_link.decompose()

trs = soup.find_all('tr')

for tr in trs:

    for link in tr.find_all('a'):
        fulllink = link.get('href')
        print fulllink #print in terminal to verify results

    tds = tr.find_all("td")


    try: #we are using "try" because the table is not well formatted. 
       names = str(tds[0].get_text()) 
       years = str(tds[1].get_text())
       positions = str(tds[2].get_text())
       parties = str(tds[3].get_text())
       states = str(tds[4].get_text())
       congress = tds[5].get_text()

       print names, years, positions, parties, states, congress

    except exc:
      print "bad tr string"

在 python 中,每个代码块都应该使用制表符/空格进行缩进嵌套。混在一起不好。

在您的代码中,您有一个将遍历所有 tr 的第一个 for 循环和一个打印所有 url 的第二个循环。

但是您忘记缩进应该在 for 循环内的第一个块。

编辑

此外,您不必在您的情况下使用 continue 。检查我对您的代码的编辑。

于 2013-10-25T17:04:05.827 回答
0

压痕在打印/继续时消失。如果它关闭,则 except: 看起来像是空的,我不确定 Python 是否对此感到满意。

尝试注释掉与 try/except 无关的所有内容,看看它是否仍然给你错误。

于 2013-10-25T16:59:55.097 回答