问题标签 [scrapy-shell]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

160 问题

0 投票

0 回答

813 浏览

python - Selenium 错误 Traceback（最近一次调用最后一次）：文件“"，第 1 行，在fb_login()

我有以下代码可以帮助我自动填写数据并登录：

但我收到以下错误：

任何人都可以帮助我查看并检查我的错误吗？

python selenium scrapy-shell

user6315578

2017-07-20T00:26:28.820

0 投票

0 回答

368 浏览

python - Scrapy 的“暂停/恢复”变成了“暂停/重启”

事情就是这样。
这是一个很大的单词列表。我想根据这些话爬取一些数据。这很耗时，所以我想把它分成几部分。

首先，我将一个单词列表加载到__init__我的蜘蛛列表中。

然后我在中创建一些初始请求start_requests()：

并且我解析了里面的项目parse_json()（这里省略了代码，不重要）。

根据官方文档，如果我在 shell 中两次使用相同的命令，例如：

然后爬虫将从它停止的地方继续它的工作。

但是，当我使用上面相同的命令恢复工作时，仍然有

屏幕上。为什么？我认为它应该继续其解析过程而不调用start_requests().
如果我想从我停止的地方继续我的爬行工作，我该如何处理？谢谢。

python scrapy scrapy-spider scrapy-shell

2017-08-02T17:56:26.743

0 投票

1 回答

2211 浏览

python - Why does my basic scrapy request get no response?

I am new to scrapy and trying to submit a form and scrape the response from https://www.fbo.gov/index?s=opportunity&tab=search&mode=list.

When I use the scrapy shell:

it opens up the shell but contains no response object. Running

returns none. I've tried using just "https://www.fbo.gov" and other variations but nothing seems to work. The example I followed used "http://quotes.toscrape.com/page/1/" and it works fine.

Why do I get no response when using a different URL? Does it have to do with the https? Do I need to use a FormRequest to get an response since the link contains a form? I figured it would at least return the html of the form. I plan to 'check' various checkboxes upon submit.

Thanks in advance for any help!

Log:

python scrapy response scrapy-shell

2017-08-10T03:58:32.803

0 投票

1 回答

710 浏览

python - Scrapy返回一个空的json文件

我正在尝试从网站获取数据，一切似乎都是正确的，并且 xpath 在 shell 上进行了测试。

虽然，输出似乎是：

我的代码有什么问题？

python json scrapy scrapy-spider scrapy-shell

2017-09-08T00:46:04.990

0 投票

0 回答

227 浏览

windows - 如何在 WinPython 和 IPython 中使用 Scrapy Shell？

我想学习scrapy，我可以在常规的Windows shell中使用scrapy shell。

只要我在 IPython 中键入 scrapy shell 'url'，我就会收到以下消息：

这对我来说很有意义，因为我的系统中没有任何东西，尤其是 IPython，还不知道 scrapy。

但是你能帮我解决这个问题吗？scrapy 文档本身并没有提供具体的解决方案。问题可能是基于 Winpython 发行版吗？我尝试了几个想法，但没有找到解决方案：/

windows shell scrapy ipython scrapy-shell

2017-09-28T08:25:48.223

0 投票

0 回答

107 浏览

scrapy - Response.css() 登录后在scrapy爬虫中没有给出分页结果

我想阅读几乎处于分页和 335 条记录中的项目列表的“标题”。我想做的是：1）首先我在windows cmd中通过这个命令得到浏览器的响应：

2）它显示了在cmd中呈现的HTML，接下来我写

然后按回车，它给了我 [] '空数组'。

问题是，如何从登录后出现的 URL 的爬虫脚本中获取数据？因为https://www.slingshotinsights.com/projects是用户成功登录页面时的链接。

而且可能是scrapy无法找到

css 选择器，因为它无法在注销视图中加载。

scrapy scrapy-spider scrapy-shell

2017-11-02T11:07:09.257

0 投票

1 回答

221 浏览

scrapy - scrapy 下载 html 页面，但可以使用 xpaths 或 css 获取数据

我正在尝试抓取此页面，当我这样做时scrapy shell "https://redsea.com/en/apple-iphone-x-64gb-silver.html"，它会下载 html 页面，我可以view(response)在浏览器中查看下载的 html：

但是，当我尝试获取任何数据产品名称时，例如，response.css('.page-title')它给了我空的响应：

使用scrapy抓取使用rest-api获取数据的网站只是下载没有数据的网站结构html，并且scrapy无法获取该数据是有道理的。但在这种情况下，scrapy 会下载带有数据的 html 文件，但无法使用 css 或 xpaths 读取它。我不明白这种行为。

scrapy scrapy-shell

2017-11-07T17:01:00.667

0 投票

3 回答

1738 浏览

python - 导入错误：在命令提示符下使用 scrapy 时 DLL 失败

尝试使用 scrapy 命令创建文件夹时出现以下问题。我尝试搜索此问题并在https://groups.google.com/forum/#!topic/scrapy-users/8N6V_OGUqtI找到了解决方案我尝试了那里提供的步骤，但仍然遇到此问题。

解决此问题的任何帮助将不胜感激。

python command-line scrapy scrapy-shell

2017-11-07T17:42:33.367

0 投票

2 回答

293 浏览

python - 欧元符号后的刮取值（Scrapy-Python）

我需要一个选择器来抓取欧元符号 (\u20ac) 之后的值。

我尝试了几十种我在 stackoverflow 上找到的变体，但我无法得到它。

像https://regexr.com/这样的方面向我展示了这样的事情：

应该工作，但它没有。

编辑：这里是我想抓取的数据示例链接：https ://www.firmenabc.at/manfred-jungwirth-montagen_MoKY

非常感谢帮助！

迈克尔

python web-scraping scrapy scrapy-shell

2017-11-11T19:05:19.797

0 投票

1 回答

82 浏览

python-3.x - 为什么在我期望有文本的地方刮擦打印 \t\n\n？

我是scrapy的初学者，但正在学习。我一直在解析这个页面。并试图从页面上刮下地址。

我已经在scrapy shell中完成了这个，所以我开始：

哪个工作正常。然后我尝试解析地址：

但我的输出如下：

['\n\t\t', '\n\t\t\n\t\t']

为什么我无法看到页面上显示的地址：

贝尔法斯特修道院中心，1 Old Glenmount Road Newtonabbey，牛顿修道院，BT36 7DN

我将如何去获取这个地址？我感谢任何花时间回复的人。

python-3.x scrapy scrapy-shell

2017-12-17T19:15:32.143

1 2 3 4 5 6 7 8 9 10