1

I want to extract the information from a webpage on demand.

User will submit a url in a form. I want to extract the info from the page and render it to the user.

I want to simulate

scrapy shell "my_url"

in my run time code.

Is there some scrapy utility that given a url which gives me HtmlXPathSelector (hxs) object?

4

1 回答 1

0

爬虫可以像普通的 python 脚本一样运行。这可以通过从我们有 scrapy.cfg 文件的项目路径运行来完成。(只是路径它与 scrapy.cfg 无关)

from twisted.internet import reactor

from scrapy.crawler import Crawler

from scrapy import log, signals

create a Obj with SpiderName spider = SampleSpider(arguments_you_want_intialize_tht_object)

settings = get_project_settings() crawler = Crawler(settings) crawler.signals.connect(reactor.stop, signal=signals.spider_closed) crawler.configure() crawler.crawl(spider) crawler.start() reactor.run()

我们可以在 CGI 脚本中添加这段代码来调用 Spider。

于 2015-10-14T18:16:55.833 回答