python - Scraperwiki Python 循环问题

Question

我正在使用 Python 通过 ScraperWiki 创建一个刮板，但我得到的结果有问题。我的代码基于 ScraperWiki 文档上的基本示例，一切看起来都非常相似，所以我不确定我的问题出在哪里。对于我的结果，我获得了页面上的第一个文档标题/URL，但循环似乎存在问题，因为它不会返回该文档之后的剩余文档。任何建议表示赞赏！

import scraperwiki
import requests
import lxml.html

html = requests.get("http://www.store.com/us/a/productDetail/a/910271.htm").content
dom = lxml.html.fromstring(html)

for entry in dom.cssselect('.downloads'):
    document = {
        'title': entry.cssselect('a')[0].text_content(),
        'url': entry.cssselect('a')[0].get('href')
    }
    print document

score 1 · Accepted Answer

您需要遍历with 类中的a标签：divdownloads

for entry in dom.cssselect('.downloads a'):
    document = {
        'title': entry.text_content(),
        'url': entry.get('href')
    }
    print document

印刷：

{'url': '/webassets/kpna/catalog/pdf/en/1012741_4.pdf', 'title': 'Rough In/Spec Sheet'}
{'url': '/webassets/kpna/catalog/pdf/en/1012741_2.pdf', 'title': 'Installation and Care Guide with Service Parts'}
{'url': '/webassets/kpna/catalog/pdf/en/1204921_2.pdf', 'title': 'Installation and Care Guide without Service Parts'}
{'url': '/webassets/kpna/catalog/pdf/en/1011610_2.pdf', 'title': 'Installation Guide without Service Parts'}

python - Scraperwiki Python 循环问题

1 回答 1

Related

Reference