python - 格式化 Scrapy 的 CSV 结果

Question

我正在尝试抓取网站并将结果保存并格式化为 CSV 文件。我可以保存文件，但是有关于输出和格式的三个问题：

所有结果都位于一个单元格中，而不是多行中。列出项目以使它们出现在列表中时，是否有我忘记使用的命令？
如何删除['u...每个结果之前的那个？（我搜索并看到了如何这样做print，但不是return）
有没有办法在某些项目结果中添加文本？（例如，我可以在每个交易链接结果的开头添加“http://groupon.com”吗？）

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from deals.items import DealsItem

class DealsSpider(BaseSpider):
    name = "groupon.com"
    allowed_domains = ["groupon.com"]
    start_urls = [
        "http://www.groupon.com/chicago/all",
        "http://www.groupon.com/new-york/all"
    ]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//div[@class="page_content clearfix"]')
        items = []
        for site in sites:
            item = DealsItem()
            item['deal1']       = site.select('//div[@class="c16_grid_8"]/a/@title').extract()
            item['deal1link']   = site.select('//div[@class="c16_grid_8"]/a/@href').extract()
            item['img1']        = site.select('//div[@class="c16_grid_8"]/a/img/@src').extract()
            item['deal2']       = site.select('//div[@class="c16_grid_8 last"]/a/@title').extract()
            item['deal2link']   = site.select('//div[@class="c16_grid_8 last"]/a/@href').extract()
            item['img2']        = site.select('//div[@class="c16_grid_8 last"]/a/img/@src').extract()
            items.append(item)
        return items

score 2 · Accepted Answer

编辑：现在我更好地理解了这个问题。你的 parse() 函数应该看起来更像下面的样子吗？也就是说，yield一次 -ing 一个项目，而不是返回一个列表。我怀疑您返回的列表是被错误格式填充到一个单元格中的内容。

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//div[@class="page_content clearfix"]')
    for site in sites:
        item = DealsItem()
        item['deal1']       = site.select('//div[@class="c16_grid_8"]/a/@title').extract()
        item['deal1link']   = site.select('//div[@class="c16_grid_8"]/a/@href').extract()
        item['img1']        = site.select('//div[@class="c16_grid_8"]/a/img/@src').extract()
        item['deal2']       = site.select('//div[@class="c16_grid_8 last"]/a/@title').extract()
        item['deal2link']   = site.select('//div[@class="c16_grid_8 last"]/a/@href').extract()
        item['img2']        = site.select('//div[@class="c16_grid_8 last"]/a/img/@src').extract()
        yield item

score 0 · Accepted Answer

查看项目管道文档：http ://doc.scrapy.org/topics/item-pipeline.html

u' 代表 unicode 编码。http://docs.python.org/howto/unicode.html

>>> s = 'foo'
>>> unicode(s)
u'foo'
>>> str(unicode(s))
'foo'

python - 格式化 Scrapy 的 CSV 结果

2 回答 2

Related

Reference