1

我正在尝试使用代理来抓取网页,但有些东西不起作用。这是设置代理选项的 httr 尝试,下面我尝试使用 RCurl。我已经阅读了有关该主题的几个答案,但它们似乎不起作用。有什么建议么?

### httr attempt
    set_config(
      use_proxy(url="proxy.xxx.com.ar", port=8080,
                username = "xxxx\\xxxx", password = "xxxxx"),
      override = TRUE
    )
    a <- GET("http://google.com/", verbose())
    -> GET http://google.com/ HTTP/1.1
    -> Proxy-Authorization: Basic dG1vdmlsZXNcbWFyYmVsOkFyYWNhbGFjYW5hMjM=
    -> User-Agent: curl/7.19.7 Rcurl/1.95.4.1 httr/0.4.0.99
    -> Host: google.com
    -> Accept: */*
    -> Accept-Encoding: gzip
    -> Proxy-Connection: Keep-Alive
    -> 
    <- HTTP/1.1 407 Proxy Authentication Required
    <- Server: pxsip02-srv.xxxxx.com.ar
    <- Date: Mon, 11 Aug 2014 15:11:14 GMT
    <- Content-Length: 309
    <- Content-Type: text/html
    <- Connection: Keep-Alive
    <- Keep-Alive: timeout=60, max=8
    <- Proxy-Authenticate: NTLM
    <- 

    content(a)
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html>
    <head><title>Authentication Error</title></head>
    <body>
    <h1>Authentication Error</h1>There has been an error validating your user credentials. If the error persists,contact your network administrator.<br>Proxy authentication required<br><hr>
    <br>Details: 407 Proxy Authentication Required</body>
    </html>


### RCurl attempt

    library("RCurl")
    opts <- list(
      proxy         = "proxy.xxxxx.com.ar", 
      proxyusername = "xxxxxx\\xxxxx", 
      proxypassword = "xxxxxx",
      proxyport     = 8080,
      capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), 
      verbose=TRUE, proxyauth=TRUE, useragent= "", header = TRUE
    )
    options( RCurlOptions = opts)

    getURL("http://stackoverflow.com")


    * About to connect() to proxy proxy.xxxxx.com.ar port 8080 (#0)
    *   Trying 10.167.195.11... * connected
    * Connected to proxy.xxxxxx.com.ar (10.167.195.11) port 8080 (#0)
    * Proxy auth using Basic with user 'xxxxxxx\xxxxx'
    > GET http://stackoverflow.com HTTP/1.1
    Proxy-Authorization: Basic VE1PVklMRVNcTUFSQkVMOkFyYWNhbGFjYW5hMjM=
    Host: stackoverflow.com
    Accept: */*
    Proxy-Connection: Keep-Alive

    [1] "HTTP/1.1 407 Proxy Authentication Required\r\nServer: pxsip02-srv.xxxx.com.ar\r\nDate: Mon, 11 Aug 2014 15:15:29 GMT\r\nContent-Length: 309\r\nContent-Type: text/html\r\nConnection: Keep-Alive\r\nKeep-Alive: timeout=60, max=8\r\nProxy-Authenticate: NTLM\r\n\r\n<html><head><title>Authentication Error</title></head><body><h1>Authentication Error</h1>There has been an error validating your user credentials. If the error persists,contact your network administrator.<br/>Proxy authentication required<br/><hr/><br/>Details: 407 Proxy Authentication Required</body></html>"
    < HTTP/1.1 407 Proxy Authentication Required
    < Server: pxsip02-srv.xxxxxxx.com.ar
    < Date: Mon, 11 Aug 2014 15:15:29 GMT
    < Content-Length: 309
    < Content-Type: text/html
    < Connection: Keep-Alive
    < Keep-Alive: timeout=60, max=8
    < Proxy-Authenticate: NTLM
    < 
    * Connection #0 to host proxy.xxxxxx.com.ar left intact
4

1 回答 1

0

这是上一个问题的更新。我将其添加为另一个答案,以便更容易理解。

GET("http://google.com/", 
         config = list(
           use_proxy(url="proxy.xxx.com.ar", port=8080,
                     username = "xxxx\\xxxx", password = "xxxxx",
                     proxyauth = 1)
         )
)

错误消息:

Error in use_proxy(url = "proxy.xxxx.com.ar", port = 8080, username = "xxxxxx\\xxxx", :
  unused argument (proxyauth = 1)
于 2014-08-13T14:59:59.543 回答