python - 部署scrapy项目

Question

我正在尝试使用scrapyd 部署scrapy 项目。我可以通过使用正常运行我的项目

cd /var/www/api/scrapy/dirbot
scrapy crawl dmoz

这是我一步一步做的：

1/ 我跑

scrapy version -v
>> Scrapy  : 0.16.3
lxml    : 3.0.2.0
libxml2 : 2.7.8
Twisted : 12.2.0
Python  : 2.7.3 (default, Aug  1 2012, 05:14:39) - [GCC 4.6.3]
Platform: Linux-3.2.0-31-virtual-x86_64-with-Ubuntu-12.04-precise

2/安装scrapyd

aptitude install scrapyd-0.16

3/ 我在 /var/www/api/scrapy/dirbot (http://domain.com/api/scrapy/dirbot) 有项目 scrapy。我编辑scrapy.cfg

[settings]
default = dirbot.settings
[deploy:scrapyd2]
url = http://domain.com/api/scrapy/dirbot/
username = vu
password = hoang

4/ 我用部署命令测试

scrapy deploy -l
>> scrapyd2             http://domain.com/api/scrapy/dirbot/

5/ 但是当我使用命令时

scrapy deploy -L scrapyd2
>> /usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/settings/deprecated.py:23: ScrapyDeprecationWarning: You are using the following settings which are deprecated or obsolete (ask scrapy-users@googlegroups.com for alternatives):
    BOT_VERSION: no longer used (user agent defaults to Scrapy now)
  warnings.warn(msg, ScrapyDeprecationWarning)
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 5, in <module>
    pkg_resources.run_script('Scrapy==0.16.3', 'scrapy')
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 499, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1235, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
    execute()
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/cmdline.py", line 131, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/cmdline.py", line 76, in _run_print_help
    func(*a, **kw)
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/cmdline.py", line 138, in _run_command
    cmd.run(args, opts)
  File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/commands/deploy.py", line 76, in run
    f = urllib2.urlopen(req)
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

和

scrapy deploy scrapyd2 -p project
>> /usr/local/lib/python2.7/dist-packages/Scrapy-0.16.3-py2.7.egg/scrapy/settings/d           eprecated.py:23: ScrapyDeprecationWarning: You are using the following settings            which are deprecated or obsolete (ask scrapy-users@googlegroups.com for alternat           ives):
    BOT_VERSION: no longer used (user agent defaults to Scrapy now)
  warnings.warn(msg, ScrapyDeprecationWarning)
Building egg of project-1358597244
'build/scripts-2.7' does not exist -- can't clean it
zip_safe flag not set; analyzing archive contents...
Deploying project-1358597244 to http://domain.com/api/scrapy/dirbot/addversio           n.json
Deploy failed (404):
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /api/scrapy/dirbot/addversion.json was not found on this se           rver.</p>
<hr>
<address>Apache/2.2.22 (Ubuntu) Server at domain.com Port 80</address>
</body></html>

* 我不明白什么是蟒蛇蛋。你能给我举个例子吗？我不知道我有没有。也许那个文件是/var/www/api/scrapy/dirbot/setup.py？

from setuptools import setup, find_packages

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = dirbot.settings']},
)

* 如何部署我的项目。我不知道我到底做错了什么，或者错过了一步？

谢谢

score 1 · Accepted Answer

从错误来看，您要抓取的网站似乎会出现 404 错误，要么是您放置了错误的网站，要么存在一些配置错误。

关于 python egg，有一个很好的答案，请查看什么是 Python egg？

关于 setup.py ：据我所知，它用于使用命令从源代码安装 python 应用程序python setup.py install

编辑：似乎我对命令感到困惑pip，对此感到抱歉

python - 部署scrapy项目

1 回答 1

Related

Reference