问题标签 [splash-js-render]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

141 问题

0 投票

0 回答

707 浏览

docker - Docker Scrapinghub/splash 以 139 退出

我正在使用 Scrapy 使用 Scrapinghub/splash docker 容器对 Splash 进行一些爬行，但是容器在一段时间后自行退出，退出代码为 139，我在分配了 1GB 交换的 AWS EC2 实例上运行刮板。

我还尝试在后台运行它并稍后查看日志，但没有任何迹象表明它只是退出了错误。

据我了解，139 是针对 UNIX 中的分段错误错误，是否有检查或记录正在访问的内存部分或正在执行的代码来调试它？

或者我可以增加容器内存或交换大小来避免这种情况吗？

2017-08-16T19:59:55.197

0 投票

3 回答

3342 浏览

python - Scrapy CrawlSpider + Splash：如何通过链接提取器跟踪链接？

我有以下部分工作的代码，

代码将只运行start_urls但不会遵循中指定的链接restricted_xpaths，如果我注释掉规则中的start_requests()方法和行process_request='start_requests',，它将按预期运行并遵循链接，当然没有js渲染。

我已经阅读了两个相关的问题，CrawlSpider with Splash 在第一个 URL和CrawlSpider with Splash后卡住，并专门更改scrapy.Request()为方法，但这似乎不起作用。我的代码有什么问题？谢谢，SplashRequest()start_requests()

python scrapy web-crawler scrapy-splash splash-js-render

2017-08-25T16:45:17.497

0 投票

0 回答

805 浏览

python - 发送带有启动和自定义标头的python获取请求

我想将 Python 请求与启动浏览器 ( https://splash.readthedocs.io/en/stable/ ) 和自定义标头一起使用来从网站上抓取一些数据。但是，在开始爬网之前，我决定在这个网站http://xhaus.com/headers上检查我发送的标题。结果，我看到我没有发送我想要发送的那些标头。

运行此代码后，我有以下用户代理：

但是，当我通过我提到的网站检查它时，它向我显示了一个不同的用户代理：

python web-crawler python-requests splash-js-render

2017-08-28T14:40:16.090

0 投票

2 回答

1651 浏览

docker - 在 Heroku 上使用 docker，scrapy splash

我有一个爬虫，它使用在 Docker localhost:8050 上运行的 splash 在抓取之前呈现 javascript。我正在尝试在 heroku 上运行它，但不知道如何配置 heroku 以在运行我的网络之前启动 docker 以运行 splash：scrapy crawl abc dyno。非常感谢任何指南！

docker heroku scrapy splash-js-render

2017-09-05T02:06:29.880

0 投票

2 回答

205 浏览

scrapy - 如何在 scrapinghub/splash docker 安装中设置密码？

我在splashubuntu 服务器上使用并按照说明安装 docker ( https://github.com/scrapy-plugins/scrapy-splash )。

如何更改设置并设置用户名和密码？

scrapy scrapy-splash splash-js-render

2017-10-16T14:22:22.297

0 投票

1 回答

1880 浏览

python-3.x - 如何从scrapy-splash获取200以外的状态码

我正在尝试使用scrapy和scrapy-splash获取请求状态代码，下面是蜘蛛代码。

我的起始 urlhttp://192.168.8.240:8000/xxxx是一个 404 状态码 url，有三种请求方式：

第一个是：

第二个是：

第三个是：

只有第二种请求方式yield scrapy.Request(url, self.parse, meta={'handle_httpstatus_all': True})可以获得正确的状态码404，第一种和第三种都可以获得状态码200，也就是说，我尝试使用scrapy-splash后，我无法获得正确的状态码404，你能帮帮我吗？

python-3.x scrapy scrapy-splash splash-js-render

2017-10-19T15:07:08.463

0 投票

1 回答

1051 浏览

python - 如何获取从 Scrapy Splash 请求生成的 cookie？

所以我提出了一个这样的 Scrapy Splash 请求：

问题是如何获取我设置为使用 lua 脚本返回的 cookie？

python lua scrapy scrapy-splash splash-js-render

2017-10-24T00:31:39.397

0 投票

1 回答

2041 浏览

python - scrapy, splash, lua, button click

I am new to all instruments here. My goal is to extract all URLs from a lot of pages which are connected moreless by a "Weiter"/"next" button - that for several URLS. I decided to try that with scrapy. The page is dynamically generated. Then I learned that I need a further instrument and installed Splash for that. The installation is working. I set up the installation according to the tutorials. Then I managed to get the first page by send a "return" in the search-input-field. With a browser that gives me the results I need. My problem is that I try to click the "next" button on the generated page and don't know exactly how. As I ve read on several pages this was not always easy. I tried the suggested solutions without success. I think I am not too far away and would appreciate some help. Thank you.

my settings.py

#xA;

my spider:

#xA;

python lua scrapy scrapy-splash splash-js-render

2017-11-05T10:12:49.610

0 投票

1 回答

710 浏览

python - 使用 scrapy-splash 选择依赖下拉列表

我正在尝试抓取以下网站：https ://www.climatempo.com.br/climatologia/558/saopaulo-sp 。它有两个下拉菜单，第二个取决于第一个，所以我选择使用scrapy 和通过scrapy-splash 飞溅。

我需要通过首先选择州，然后选择城市来自动更改位置。我尝试了 SplashFormRequest，但无法更改城市列表。我的蜘蛛是（打印调试）：

python web-scraping scrapy scrapy-splash splash-js-render

2017-11-30T13:54:42.347

0 投票

0 回答

173 浏览

python - R 'Splashr' - Windows 上的错误

我正在尝试按照本教程让 R 包“Splashr”工作。

我已经成功安装了Docker for windows、Docker SDK for Python和（希望）依赖的 Python 包。我已经在系统变量中设置了 Python 的路径，并在 Python 2.7 和 3.6 中尝试了这个 R 代码，但得到了同样的错误：

我正在使用 Windows 10 专业版 1703

R 版本 3.4.3

R Studio 版本 1.1.383

提前致谢

python r splash-js-render

2017-12-10T14:44:17.613

1 2 3 4 5 6 7 8 9 10

问题标签 [splash-js-render]

Reference