我正在尝试从该站点抓取代理商的电话号码:
列表查看 http://www.authoradvance.com/agencies/
详情查看 http://www.authoradvance.com/agencies/b-personal-management/
电话号码隐藏在详细信息页面中。
那么是否可以通过上面的详细视图 url 之类的 url 浏览网站并抓取电话号码?
我对这段代码的尝试是:
from scrapy.item import Item, Field
class AgencyItem(Item):
Phone = Field()
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from agentquery.items import AgencyItem
class AgencySpider(CrawlSpider):
name = "agency"
allowed_domains = ["authoradvance.com"]
start_urls = ["http://www.authoradvance.com/agencies/"]
rules = (Rule(SgmlLinkExtractor(allow=[r'agencies/*$']), callback='parse_item'),)
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select("//div[@class='section-content']")
items = []
for site in sites:
item = AgencyItem()
item['Phone'] = site.select('div[@class="phone"]/text()').extract()
items.append(item)
return(items)
然后我运行“scrapy crawl Agency -o items.csv -t csv”,结果爬取了0页。
怎么了?提前感谢您的帮助!