python - 从scraperwiki上的beautifulsoup向sqlite发送数据但得到KeyError：'href'

Question

我正在尝试使用 ScraperWiki 学习 Python 和 Beautiful Soup。我想要一份埃德蒙顿所有 kickstarter 项目的清单。

我已经成功地抓取了我正在寻找的页面并提取了我想要的数据。我无法将该数据格式化并导出到数据库。

控制台输出：

Line 42 - url = link["href"]

/usr/local/lib/python2.7/dist-packages/bs4/element.py:879 -- __getitem__((self=<h2 class="bbcard_nam...more

KeyError: 'href'

代码：

import scraperwiki
from bs4 import BeautifulSoup

search_page ="http://www.kickstarter.com/projects/search?term=edmonton"
html = scraperwiki.scrape(search_page)
soup = BeautifulSoup(html)

max = soup.find("p", { "class" : "blurb" }).get_text()
num = int(max.split(" ")[0])

if num % 12 != 0:
    last_page = int(num/12) + 1
else:
    last_page = int(num/12)

for n in range(1, last_page + 1):
    html = scraperwiki.scrape(search_page + "&page=" + str(n))
    soup = BeautifulSoup(html)
    projects = soup.find_all("h2", { "class" : "bbcard_name" })
    counter = (n-1)*12 + 1
    print projects

    for link in projects:
        url = link["href"]
        data = {"URL": url, "id": counter}
#save into the data store, giving the unique parameter
        scraperwiki.sqlite.save(["URL"],data)
        counter+=1

在项目中有锚点href。如何从循环<h2>中的每个元素获取 URL？for

score 2 · Accepted Answer

好吧，你要求<h2>标签，这就是 BeautifulSoup 给你的。显然，这些都没有href属性，因为标题不能有href属性。

说for link in projects只是给projects（2级标题）中的每个项目命名 link，它并没有神奇地将它们变成链接。

冒着看起来很明显的风险，如果您想要链接，请寻找<a>标签...？或者也许你想要每个标题内的所有链接......例如

for project in projects:
   for link in project.find_all("a"):

或者，也许可以取消查找项目并直接访问链接：

for link in soup.select("h2.bbcard_name a"):

score 2 · Accepted Answer

您正在标签中寻找href属性。<h2>

这段代码：

for link in projects:

遍历projects，其中包含<h2>标签，而不是链接。

我不太清楚你想要什么，但我假设你想在标签中找到标签的属性href，试试这个：<a><h2>

data = {"URL":[], "id":counter}
for header in projects: #take the header)
    links = header.find_all("a")
    for link in links:
        url = link["href"]

此外，data = {"URL": url, "id": counter}覆盖data每个循环上的字典。所以把它改成这样：

data["URL"].append(url) # store it on this format {'URL':[link1,link2,link3]}

python - 从scraperwiki上的beautifulsoup向sqlite发送数据但得到KeyError：'href'

2 回答 2

Related

Reference