我正在使用scrapy来废弃craigslist并获取所有链接,转到该链接,存储每个页面的描述并通过电子邮件发送回复。现在我已经编写了一个scrapy 脚本,它可以浏览craigslist/sof.com 并获取所有职位和网址。我想进入每个 url 并保存每个工作的电子邮件和描述。这是我的代码:
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from craigslist.items import CraigslistItem
class MySpider(BaseSpider):
name = "craig"
allowed_domains = ["craigslist.org"]
start_urls = ["http://sfbay.craigslist.org/npo/"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//span[@class='pl']")
for titles in titles:
title = titles.select("a/text()").extract()
link = titles.select("a/@href").extract()
desc = titles.select("a/replylink").extract
print link, title
任何想法如何做到这一点?