1

我正在尝试抓取一个活动网站,并且我有附加的代码来抓取活动名称和位置。我将输出写入 csv 文件,但随后 csv 文件将所有事件名称附加在一行中。

例如,假设我有两个事件 Bruno Mars 和 Maroon 5,它们的位置分别为 San Jose、Santa Clara。当前输出为,

事件名称事件位置

Bruno Mars, Maroon 5 San Jose, 圣克拉拉

但我希望看到,

事件名称事件位置

布鲁诺马尔斯圣何塞

栗色 5 圣克拉拉。

有人可以让我知道为什么这种格式对我来说变得很奇怪吗?我在这里附上了代码。然后我用它scrapy crawl event_spider -o output.csv -t csv来运行我的代码。

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from event_test.items import EventItem


class EventSpider(BaseSpider):
    name = "event_spider"
    allowed_domains = ["eventful.com"]
    start_urls = [
         "http://eventful.com/sanjose/events"
    ]

   def parse(self, response):
     hxs = HtmlXPathSelector(response)
     events = hxs.select("/html/body[@id='events']/div[@id='outer-container']/div[@id='mid-container']/div[@id='inner-container']/div[@id='content']/div[@class='cols-2-1']/div[@class='alpha']/div[@id='top-events']/div[@class='section top-events cage-dbl-border cage-bdr-mdgrey']/div[@id='events-scroll']/div[@id='events-scroll-items']/ul[@id='events-scroll-items-list']/li[@class='top-events-item  ']")
     items = []
     for event in events:
        item = EventItem()
        item['event_name'] = event.select("//h2/a/span/text()").extract()
        item['event_locality'] = event.select("//span[@class='locality']/text()").extract()
        items.append(item)
     return items
4

1 回答 1

1

我已经简化了蜘蛛中的代码和 xpath:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from event_test.items import EventItem


class EventSpider(BaseSpider):
    name = "event_spider"
    allowed_domains = ["eventful.com"]
    start_urls = ["http://eventful.com/sanjose/events"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        events = hxs.select("//li[contains(@class, 'top-events-item')]")
        for event in events:
            item = EventItem()
            item['event_name'] = event.select(".//h2/a/span/text()").extract()[0]
            item['event_locality'] = event.select(".//span[@class='locality']/text()").extract()[0]
            yield item

这是您将在 csv 文件中获得的内容:

event_name,event_locality
Under the Influence of Music Tour,Mountain View
Bruno Mars,San Jose
John Mayer: Born & Raised Tour 2013,Mountain View
New Kids on the Block with 98 Degrees and ...,San Jose
Amy Grant,San Jose
Styx,Saratoga
Bob Dylan with Wilco,Mountain View
Kenny Chesney with Eli Young Band,Mountain View
Smash Mouth \/ Sugar Ray \/ Gin Blossoms \...,Saratoga
Creedence Clearwater Revisited \/ 38 Special,Saratoga

希望有帮助。

于 2013-07-10T18:15:40.980 回答