python - Scrapy部署不再工作

Question

我似乎遇到了导致一些监听错误的 Scrapy spider 部署问题，尽管我无法成功使用以前的任何答案，要么是因为这是一个不同的问题，要么是修复不详细足以让我跟随。

我已经上传了一个项目，并且部署命令昨天工作了。现在我又玩弄它了，当我运行 scrapy deploy -l 查看项目列表时，我得到了这个：

Scrapy 0.24.4 - no active project

Unknown command: deploy

Use "scrapy" to see available commands

所以一个常见的修复似乎是说我需要使用以下命令重新启动 Scrapyd：scrapyd。当我这样做时，我得到：

2014-09-17 01:58:47+0000 [-] Log opened.
2014-09-17 01:58:47+0000 [-] twistd 13.2.0 (/usr/bin/python 2.7.6) starting up.
2014-09-17 01:58:47+0000 [-] reactor class: twisted.internet.epollreactor.EPollReactor.
2014-09-17 01:58:47+0000 [-] Traceback (most recent call last):
2014-09-17 01:58:47+0000 [-]   File "/usr/bin/scrapyd", line 8, in <module>
2014-09-17 01:58:47+0000 [-]     run()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/scripts/twistd.py", line 27, in run
2014-09-17 01:58:47+0000 [-]     app.run(runApp, ServerOptions)
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/app.py", line 642, in run
2014-09-17 01:58:47+0000 [-]     runApp(config)
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/scripts/twistd.py", line 23, in runApp
2014-09-17 01:58:47+0000 [-]     _SomeApplicationRunner(config).run()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/app.py", line 380, in run
2014-09-17 01:58:47+0000 [-]     self.postApplication()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/scripts/_twistd_unix.py", line 193, in postApplication
2014-09-17 01:58:47+0000 [-]     self.startApplication(self.application)
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/scripts/_twistd_unix.py", line 381, in startApplication
2014-09-17 01:58:47+0000 [-]     service.IService(application).privilegedStartService()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/service.py", line 277, in privilegedStartService
2014-09-17 01:58:47+0000 [-]     service.privilegedStartService()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/internet.py", line 105, in privilegedStartService
2014-09-17 01:58:47+0000 [-]     self._port = self._getPort()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/application/internet.py", line 133, in _getPort
2014-09-17 01:58:47+0000 [-]     'listen%s' % (self.method,))(*self.args, **self.kwargs)
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 495, in listenTCP
2014-09-17 01:58:47+0000 [-]     p.startListening()
2014-09-17 01:58:47+0000 [-]   File "/usr/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 980, in startListening
2014-09-17 01:58:47+0000 [-]     raise CannotListenError(self.interface, self.port, le)
2014-09-17 01:58:47+0000 [-] twisted.internet.error.CannotListenError: Couldn't listen on 0.0.0.0:6800: [Errno 98] Address already in use.

根据该信息和此处发布的其他一些问题，似乎是某种听力错误，但我只是无法弄清楚哪个解决方案应该有效或在哪里进行调整。

编辑：

这是我重新启动 Scrapyd 后得到的结果：

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:6800            0.0.0.0:*               LISTEN      956/python      
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1004/sshd       
tcp6       0      0 :::22                   :::*                    LISTEN      1004/sshd       
udp        0      0 0.0.0.0:14330           0.0.0.0:*                           509/dhclient    
udp        0      0 0.0.0.0:68              0.0.0.0:*                           509/dhclient    
udp6       0      0 :::3311                 :::*                                509/dhclient

编辑2：

编辑 2

所以我回溯并再次开始在我的本地项目目录中尝试找出这一切都出错了。当我尝试在本地列出它们时，这就是我现在得到的：

Christophers-MacBook-Pro:shn Chris$ scrapy deploy -l
aws-target           http://*********.compute-1.amazonaws.com:6800/
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 5, in <module>
    pkg_resources.run_script('Scrapy==0.22.2', 'scrapy')
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 489, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pkg_resources.py", line 1207, in run_script
    execfile(script_filename, namespace, namespace)
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
    execute()
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/Scrapy-0.22.2-py2.7.egg/scrapy/commands/deploy.py", line 76, in run
    print("%-20s %s" % (name, target['url']))
KeyError: 'url'

编辑 3：

这是配置文件...

# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/en/latest/topics/scrapyd.html

[settings]
default = shn.settings

[deploy:local-target]
#url = http://localhost:6800/
project = shn

[deploy:aws-target]
url = http://********.compute-1.amazonaws.com:6800/
project = shn

对于它的价值，我现在可以使用 curl 选项再次运行它，它会在 aws :6800 上保存一个日志文件和一个输出。尽管如此，scrapy deploy 命令仍然给了我之前发布的错误。

score 1 · Accepted Answer

听起来像scrapyd 仍在运行，因为twisted 还没有释放端口。您能否确认使用 netstat：

$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:17123         0.0.0.0:*               LISTEN      1048/python
tcp        0      0 0.0.0.0:6800            0.0.0.0:*               LISTEN      1434/python
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      995/sshd
tcp6       0      0 :::22                   :::*                    LISTEN      995/sshd
udp        0      0 127.0.0.1:8125          0.0.0.0:*                           1047/python
udp        0      0 0.0.0.0:68              0.0.0.0:*                           493/dhclient
udp        0      0 0.0.0.0:16150           0.0.0.0:*                           493/dhclient
udp6       0      0 :::28687                :::*                                493/dhclient

杀死scrapyd：

$ sudo kill -INT $(cat /var/run/scrapyd.pid)

然后重启：

$ sudo service scrapyd start

然后 cd 进入项目目录，确保在 scrapy.cfg 文件中定义了部署目标：

$ cd ~/takeovertheworld
vagrant@portia:~/takeovertheworld$ cat scrapy.cfg

# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/en/latest/topics/scrapyd.html

[settings]
default = takeovertheworld.settings

[deploy:local-target]
url = http://localhost:6800/
project = takeovertheworld

[deploy:aws-target]
url = http://my-ec2-instance.amazonaws.com:6800/
project = takeovertheworld

并部署项目：

vagrant@portia:~/takeovertheworld$ scrapy deploy aws-target
Packing version 1410145736
Deploying to project "takeovertheworld" in http://ec2-xx-xxx-xx-xxx.compute-1.amazonaws.com:6800/addversion.json
Server response (200):
{"status": "ok", "project": "takeovertheworld", "version": "1410145736", "spiders": 1}

编辑您的 scrapy.cfg 文件。如果不需要，请从 local-target 的 url 行中删除 # 或完全删除 local-target。

score 0 · Accepted Answer

尝试在您的 amazon ec2 服务器上停止并重新启动 scrapyd 服务。确保您的配置文件具有正确的部署信息

    [deploy:deploye_name]
    url = http://ip_Address:port_number/
    project = your_project_name

转到 config.cfg 所在的项目目录，并检查可用的部署

    scrapy deploy -l

python - Scrapy部署不再工作

2 回答 2

Related

Reference