-1

我在我的项目中使用代理轮换来防止被网站禁止,我必须抓取http://website/0001http://website/9999的 URL 列表,当检测到我正在抓取时,他们发送我到网站/contact.html。

我已经在设置中有我的代理列表
ROTATING_PROXY_LIST = [ 'proxy1.com:8000', 'proxy2.com:8031', # ... ]

我创建了这个蜘蛛:

    next_page_url = response.url[17:]//getting the relative url from website/page

    if next_page_url == "contact.html":

        absolute_next_page = response.urljoin(last_page)
        yield Request(absolute_next_page)
        //should try the same page with different proxy
    else:
        next_page_url = int(next_page_url)+1
        last_page = str(next_page_url).zfill(4)
        absolute_next_page = response.urljoin(last_page)
        yield Request(absolute_next_page)`

但它给出了一个错误,说 UnboundLocalError: local variable 'last_page' referenced before assignment

如何指定代理在此蜘蛛中已死?还是有另一种方法可以做同样的事情?

4

1 回答 1

0

你想问什么?

你说你有错误

UnboundLocalError: local variable 'last_page' referenced before assignment

此错误表明您正在尝试使用未初始化货币的变量。

所以为了防止这个错误,像这样改变你的代码

next_page_url = response.url[17:]//getting the relative url from website/page

next_page_url = int(next_page_url)+1
last_page = str(next_page_url).zfill(4)
absolute_next_page = response.urljoin(last_page)

if next_page_url == "contact.html":

        next_page_url = int(next_page_url)+1
        absolute_next_page = response.urljoin(last_page)

        req = Request(url = absolute_next_page)

        // If you want to try the same link again, then do this
        // req = Request(url = response.url)

        req.meta['proxy'] = random.choice(ROTATING_PROXY_LIST) // choose a random proxy

        yield req

else:

        yield Request(absolute_next_page)
于 2017-06-21T07:26:03.080 回答