2

解析器完成数据抓取后如何调用 writeXML?目前我可以看到数据爬网,但看不到输出文件。我尝试在 writeXML 下打印也没有输出。

以下是我的代码:

class FriendSpider(BaseSpider):
    # identifies of the Spider
    name = "friend"
    count = 0 
    allowed_domains = ["example.com.us"]
    start_urls = [
        "http://example.com.us/biz/friendlist/"
    ]

    def start_requests(self):
        for i in range(0,1722,40):
            yield self.make_requests_from_url("http://example.com.us/biz/friendlist/?start=%d" % i)

    def parse(self, response):
        response = response.replace(body=response.body.replace('<br />', '\n')) 
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//ul/li')
        items = []

        for site in sites:
            item = Item()
            self.count += 1
            item['id'] = str(self.count)
            item['name'] = site.select('.//div/div/h4/text()').extract()
            item['address'] = site.select('h4/span/text()').extract()
            item['review'] = ''.join(site.select('.//div[@class="review"]/p/text()').extract())
            item['birthdate'] = site.select('.//div/div/h5/text()').extract()

            items.append(item)
        return items

    def writeXML(self, items):
        root = ET.Element("Test")
        for item in items:
            item= ET.SubElement(root,'item')
            item.set('id', item['id'])
            address= ET.SubElement(item, 'address')
            address.text = item['address']
            user = ET.SubElement(item, 'user')
            user.text = item['user']
            birthdate= ET.SubElement(item, 'birthdate')
            birthdate.text = item['birthdate']
            review = ET.SubElement(item, 'review')
            review.text = item['review']

        # wrap it in an ElementTree instance, and save as XML
        file = open("out.xml", 'w')
        tree = ET.ElementTree(root)
        tree.write(file,xml_declaration=True,encoding='utf-8',method="xml")
4

1 回答 1

2

要使用内置 XML 导出器进行输出,请尝试以下命令:

scrapy crawl friend -o items.xml -t xml

如果输出不符合您的喜好,那么您可以尝试使用XMLExporter 类作为基础创建自己的导出器。

于 2013-04-08T00:00:28.863 回答