嗨,我正在尝试使用 crawlspider,我创建了自己的拒绝规则
class MySpider(CrawlSpider):
name = "craigs"
allowed_domains = ["careers-cooperhealth.icims.com"]
start_urls = ["careers-cooperhealth.icims.com"]
d= [0-9]
path_deny_base = [ '.(login)', '.(intro)', '(candidate)', '(referral)', '(reminder)', '(/search)',]
rules = (Rule (SgmlLinkExtractor(deny = path_deny_base,
allow=('careers-cooperhealth.icims.com/jobs/…;*')),
callback="parse_items",
follow= True), )
我的蜘蛛仍然抓取了https://careers-cooperhealth.icims.com/jobs/22660/registered-nurse-prn/login之类的页面,其中不应抓取登录名这里有什么问题?