python - 如何从硬编码的 url 中获取解析后的 html

Question

在我的scrapy中，我只想要来自自定义url的变量中的html响应。

假设我有网址

url = "http://www.example.com"

现在我想获取该页面的 html 进行解析

pageHtml = scrapy.get(url)

我想要这样的东西

page = urllib2.urlopen('http://yahoo.com').read()

我无法在爬虫中使用上述行的唯一问题是因为我的会话已经通过scrapy 身份验证，所以我不能使用任何其他函数来获取该函数的 html

我不想在任何回调中响应，而只是直接在变量中

score 1 · Accepted Answer

基本上，您只需要为该问题中的代码添加相关导入即可工作。您还需要添加一个link在该示例代码中使用但未定义的变量。

import httplib
from scrapy.spider import BaseSpider
from scrapy.http import TextResponse

bs = BaseSpider('some')
# etc

1 回答 1