4

我正在尝试在我的 pipelines.py 中导入一个 django 应用程序的模型,以使用 django orm 保存数据。我在第一个涉及的django应用程序“app1”中创建了一个scrapy项目scrapy_project(顺便说一句,这是一个不错的选择吗?)。我将这些行添加到我的scrapy设置文件中:

def setup_django_env(path):
  import imp, os
  from django.core.management import setup_environ

  f, filename, desc = imp.find_module('settings', [path])
  project = imp.load_module('settings', f, filename, desc)

  setup_environ(project)

current_dir = os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
setup_django_env(os.path.join(current_dir, '../../d_project1'))

当我尝试导入我的 django 应用程序 app1 的模型时,我收到以下错误消息:

Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 4, in <module>
    execute()
  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 122, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 76, in     _run_print_help
    func(*a, **kw)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 129, in     _run_command
    cmd.run(args, opts)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 43, in     run
    spider = self.crawler.spiders.create(spname, **opts.spargs)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/command.py", line 33, in crawler
    self._crawler.configure()
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 41, in configure
    self.engine = ExecutionEngine(self, self._spider_closed)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 63, in     __init__
    self.scraper = Scraper(crawler)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in     __init__
    self.itemproc = itemproc_cls.from_crawler(crawler)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in     from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in     from_settings
    mwcls = load_object(clspath)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 39, in     load_object
    raise ImportError, "Error loading object '%s': %s" % (path, e)
ImportError: Error loading object 'scrapy_project.pipelines.storage.storage': No module         named dydict.models

为什么scrapy不能访问django应用程序模型(给定app1在installed_app)?

4

2 回答 2

1

在不导入 django 模型的管道中,您使用绑定到 django 模型的 scrapy 模型。你必须在scrapy设置中添加Django设置,而不是之后。

要在 scrapy 项目中使用 django 模型,您必须使用 django_Item https://github.com/scrapy-plugins/scrapy-djangoitem(导入到您的 pythonpath)

我推荐的文件结构是:

Projects
 |-DjangoScrapy
     |-DjangoProject
     |     |-Djangoproject
     |     |-DjangoAPP
     |-ScrapyProject
            |-ScrapyProject
                 |-Spiders

然后在您的scrapy项目中,您必须将pythonpath ull路径添加到django项目

**# Setting up django's project full path.**
import sys
sys.path.insert(0, '/home/PycharmProject/scrap/DjangoProject')

# Setting up django's settings module name.
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'DjangoProject.settings'

然后在您的 items.py 中,您可以将 Django 模型绑定到 scrapy 模型:

from DjangoProject.models import Person, Job
from scrapy_djangoitem import DjangoItem

class Person(DjangoItem):
    django_model = Person
class Job(DjangoItem):
    django_model = Job

然后你可以在对象的 yeld 之后在管道中使用 .save() 方法:

蜘蛛.py

from scrapy.spider import BaseSpider
from mybot.items import PersonItem

class ExampleSpider(BaseSpider):
    name = "example"
    allowed_domains = ["dmoz.org"]
    start_urls = ['http://www.dmoz.org/World/Espa%C3%B1ol/Artes/Artesan%C3%ADa/']

    def parse(self, response):
        # do stuff
        return PersonItem(name='zartch')

管道.py

from myapp.models import Person

class MybotPipeline(object):
    def process_item(self, item, spider):
        obj = Person.objects.get_or_create(name=item['name'])
        return obj

我有一个代码最少的存储库:(你只需要在scrapy设置中设置你的django项目的路径) https://github.com/Zartch/Scrapy-Django-Minimal

在: https ://github.com/Zartch/Scrapy-Django-Minimal/blob/master/mybot/mybot/settings.py 你必须将我的 Django 项目路径更改为你的 DjangoProject 路径:

sys.path.insert(0, '/home/zartch/PycharmProjects/Scrapy-Django-Minimal/myweb')
于 2015-12-01T18:29:05.127 回答
0

尝试:

from .. models import MyModel 

或者

from ... models import MyModel

每个点代表位置

于 2013-03-10T14:13:04.013 回答