python-2.7 - scrapy shell 启用 javascript

Question

我正在尝试在 scrapy shell 中获取https://www.wickedfire.com/的 response.body。但在 response.body 它告诉我：

<html><title>You are being redirected...</title>\n<noscript>Javascript is required. Please enable javascript before you are allowed to see this page...

我如何激活javascript？或者还有什么我可以做的吗？

先感谢您

更新： 我已经安装了 pip install scrapy-splash 并将这些命令放在 settings.py

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

SPLASH_URL = 'http://localhost:8050/'

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

它确实给了我一个错误：

NameError: Module 'scrapy_splash' doesn't define any object named 'SplashCoockiesMiddleware'

我在那个错误之后把它作为评论。它通过了。

我的脚本是这样的......但它不起作用

...
from scrapy_splash import SplashRequest
...

        start_urls = ['https://www.wickedfire.com/login.php?do=login']

        payload = {'vb_login_username':'','vb_login_password':''}

        def start_requests(self):
                for url in self.start_urls:
                        yield SplashRequest(url, self.parse,args={'wait':1})


        def parse(self, response):
#               url = "https://www.wickedfire.com/login.php?do=login"
                r = SplashFormRequest(response,formdata=payload,callback=self.after_login)
                return r

        def after_login(self,response):
                print response.body + "THIS IS THE BODY"
                if "incorrect" in response.body:
                        self.logger.error("Login failed")
                        return
                else:

                       results = FormRequest.from_response(response,
                                                formdata={'query': 'bitter'},
                                                callback=self.parse_page)
                        return results

...

这是我得到的错误：

 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wickedfire.com/ via http://localhost:8050/render.html> (failed 1 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wickedfire.com/ via http://localhost:8050/render.html> (failed 2 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://wickedfire.com/ via http://localhost:8050/render.html> (failed 3 times): 502 Bad Gateway
[scrapy.core.engine] DEBUG: Crawled (502) <GET https://wickedfire.com/ via http://localhost:8050/render.html> (referer: None) ['partial']
[scrapy.spidermiddlewares.httperror] INFO: Ignoring response <502 https://wickedfire.com/>: HTTP status code is not handled or not allowed

我还使用本指南尝试了用scrapy shell 进行scrapy splash

我只想登录到页面并输入关键字进行搜索并获得结果。这是我的最终结果。

python-2.7 - scrapy shell 启用 javascript

0 回答 0

Related

Reference