python - 在scrapy中使用cookie的正确形式是什么

Question

我是个新手，我在使用 cookie 的网络中使用 scrapy，这对我来说是个问题，因为我可以在没有 cookie 的网络上获取数据，但获取带有 cookie 的网络的数据对我来说很困难。我有这个代码结构

class mySpider(BaseSpider):
    name='data'
    allowed_domains =[]
    start_urls =["http://...."]

def parse(self, response):
    sel = HtmlXPathSelector(response)
    items = sel.xpath('//*[@id=..............')

    vlrs =[]

    for item in items:
        myItem['img'] = item.xpath('....').extract()
        yield myItem

这很好，我可以使用这个代码结构获得没有 cookie 的良好数据我发现它因为我可以在这个 url 中使用 cookie，但我不明白我应该把这个代码放在哪里然后能够使用路径

我正在测试这段代码

request_with_cookies = Request(url="http://...",cookies={'country': 'UY'})

但我不知道我可以工作或将此代码放在哪里，我将此代码放入函数解析中，以获取数据

def parse(self, response):
    request_with_cookies = Request(url="http://.....",cookies={'country':'UY'})

    sel = HtmlXPathSelector(request_with_cookies)
    print request_with_cookies

我尝试将 XPath 与这个带有 cookie 的新 url 一起使用，以便稍后打印这个新的数据抓取使用这些 cookie 的正确方法是什么，我有点迷茫，非常感谢。

score 3 · Accepted Answer

你很亲近！parse() 方法的约定是它是s、 s 或两者的混合yield（或返回可迭代）。在你的情况下，你应该做的就是ItemRequest

yield request_with_cookies

并且您的 parse() 方法将使用通过Response使用这些 cookie 请求该 URL 产生的对象再次运行。

http://doc.scrapy.org/en/latest/topics/spiders.html?highlight=parse#scrapy.spider.Spider.parse http://doc.scrapy.org/en/latest/topics/request-response .html

python - 在scrapy中使用cookie的正确形式是什么

1 回答 1

Related

Reference