python - 如何在没有 Polipo 的情况下将 Crawlera 与 selenium（Python、Chrome、Windows）一起使用

Question

所以基本上我正在尝试在使用 python 的 windows 上使用来自 scrapinghub 的 Crawlera 代理和 selenium chrome。

我检查了文档，他们建议像这样使用 Polipo：

1) 将以下行添加到 /etc/polipo/config

parentProxy = "proxy.crawlera.com:8010"
parentAuthCredentials = "<CRAWLERA_APIKEY>:"

2）将此添加到硒驱动程序

polipo_proxy = "127.0.0.1:8123"
proxy = Proxy({
    'proxyType': ProxyType.MANUAL,
    'httpProxy': polipo_proxy,
    'ftpProxy' : polipo_proxy,
    'sslProxy' : polipo_proxy,
    'noProxy'  : ''
})

capabilities = dict(DesiredCapabilities.CHROME)
proxy.add_to_capabilities(capabilities)
driver = webdriver.Chrome(desired_capabilities=capabilities)

现在我不想使用 Polipo 并直接使用代理。

有没有办法替换 polipo_proxy 变量并将其更改为 crawlera 变量？每次我尝试这样做时，它都不会考虑它并且在没有代理的情况下运行。

Crawlera 代理格式如下所示：[API KEY]:@[HOST]:[PORT]

我尝试使用以下行添加代理：

chrome_options.add_argument('--proxy-server=http://[API KEY]:@[HOST]:[PORT])

但问题是我需要以不同的方式指定 HTTP 和 HTTPS。

先感谢您！

score 0 · Accepted Answer

Polipo 不再维护，因此在使用它时存在挑战。Crawlera 需要身份验证，Chrome 驱动程序目前似乎不支持。您可以尝试使用 Firefox webdriver，因为您可以在自定义 Firefox 配置文件中设置代理身份验证，并使用在代理服务器后运行 selenium和http://toolsqa.com/selenium-webdriver/http-proxy-中所示的配置文件认证/ .

我一直遭受同样的问题，并从中得到了一些缓解。希望它也能帮助你。要解决此问题，您必须使用 Firefox 驱动程序及其配置文件以这种方式放置代理信息。

profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.http", "proxy.server.address")
profile.set_preference("network.proxy.http_port", "port_number")
profile.update_preferences()
driver = webdriver.Firefox(firefox_profile=profile)

这对我完全有用。作为参考，您可以使用上述网站。

score 0 · Accepted Answer

Scrapinghub 创建了一个新项目。您需要使用apikey设置一个转发代理，然后设置webdriver使用这个代理。项目地址为：zyte-smartproxy-headless-proxy

你可以看看

python - 如何在没有 Polipo 的情况下将 Crawlera 与 selenium（Python、Chrome、Windows）一起使用

2 回答 2

Related

Reference