7

我想在 R 中的getURL函数中使用 Tor。Tor 正在工作(在 Firefox 中检查),socks5位于port 9050. 但是当我在 R 中设置它时,我收到以下错误

html <- getURL("http://www.google.com", followlocation = T, .encoding="UTF-8", .opts = list(proxy = "127.0.0.1:9050", timeout=15))

curlPerform 中的错误(curl = curl,.opts = opts,.encoding = .encoding):'\n\nTor 不是 HTTP 代理\n\n\n

Tor 不是 HTTP 代理

\n

\n您似乎已将 Web 浏览器配置为使用 Tor 作为 HTTP 代理。\n这是不正确的:Tor 是 SOCKS 代理,而不是 HTTP 代理。\n请相应地配置您的客户端。

我试过用袜子、socks5 替换代理,但没有用。

4

4 回答 4

8

R有curl 绑定,之后您可以使用 curl 调用 Tor SOCKS5 代理服务器。

来自 shell 的调用(您可以将其转换为 R 绑定)是:

curl --socks5-hostname 127.0.0.1:9050 google.com

Tor 也会为 A 记录做 DNS。

于 2013-09-22T17:28:25.580 回答
7

RCurl will default to a HTTP proxy, but Tor provides a SOCKS proxy. Tor is clever enough to understand that the proxy client (RCurl) is trying to use a HTTP proxy, hence the error message in HTML returned by Tor.

In order to get RCurl, and curl, to use a SOCKS proxy, you can use a protocol prefix, and there are two protocol prefixes for SOCKS5: "socks5" and "socks5h" (see the Curl manual). The latter will let the SOCKS server handle DNS-queries, which is the preferred method when using Tor (in fact, Tor will warn you if you let the proxy client resolve the hostname).

Here is a pure R solution which will use Tor for dns-queries.

library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050"))
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)

If you want to specify additional parameters, see below on where to put them:

library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050",
                            useragent = "Mozilla",
                            followlocation = TRUE,
                            referer = "",
                            cookiejar = "my.cookies.txt"
                            )
        )
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)
于 2014-02-27T23:27:36.587 回答
2

嗨 Naparst 我真的很感激有关如何执行您建议的解决方案的提示选项应该类似于: opts <- list(socks5.hostname="127.0.0.1:9050") (这不起作用,因为 socks5.hostname 是不是一个选项)

于 2013-10-02T15:46:30.810 回答
2

在 Mac OSX 下安装Tor Bundle for MacPrivoxy,然后在系统偏好设置中更新代理设置。

'系统首选项' --> 'Wi-FI' --> '高级' --> '代理' --> 设置'Web Proxy (HTTP)' Web Proxy Server 127.0.0.1:8118

'系统首选项' --> 'Wi-FI' --> '高级' --> '代理' --> 设置'安全 Web 代理 (HTTPS)' 安全 Web 代理服务器 127.0.0.1:8118 --> 'OK ' --> '应用'

library(rcurl)
curl <- getCurlHandle()
curlSetOpt(proxy='127.0.0.1:9150',proxytype=5,curl=curl)
html <- getURL(url='check.torproject.com',curl=curl)
于 2013-10-10T04:12:56.510 回答