6

我想从 GET 或 POST 获取响应标头。

我的例子是:

    library(httr)
    library(RCurl)
    url<-'http://www.omegahat.org/RCurl/philosophy.html'
    doc<-GET(url)
    names(doc)

[1] "url"         "handle"      "status_code" "headers"     "cookies"     "content"     "times"       "config"  

但没有响应标头,只有请求标头。

结果应该是这样的:

Connection:Keep-Alive
Date:Mon, 11 Feb 2013 20:21:56 GMT
ETag:"126a001-e33d-4c12cf2702440"
Keep-Alive:timeout=15, max=100
Server:Apache/2.2.14 (Ubuntu)
Vary:Accept-Encoding

我可以用 R 和 httr/RCurl 包做到这一点,还是 R 不足以解决这类问题?

编辑:我想获取所有响应标头。我主要对不在此示例中的位置响应感兴趣。

Edit2:我忘了告诉我在哪个系统上工作 - 它是 Windows 7

我的会话信息

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C                   LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rjson_0.2.12 RCurl_1.95-3 bitops_1.0-5 httr_0.2     XML_3.95-0.1

loaded via a namespace (and not attached):
[1] digest_0.6.2  stringr_0.6.2 tools_2.15.2 
4

2 回答 2

6

你可以这样做:

h <- basicHeaderGatherer()
doc <- getURI("http://www.omegahat.org/RCurl/index.html", headerfunction = h$update)
h$value()

这会给你一个命名的向量:

                            Date                           Server 
 "Mon, 11 Feb 2013 20:41:58 GMT"         "Apache/2.2.14 (Ubuntu)" 
                   Last-Modified                             ETag 
 "Wed, 24 Oct 2012 15:49:35 GMT" "\"3262089-10bf-4ccd0088461c0\"" 
                   Accept-Ranges                   Content-Length 
                         "bytes"                           "4287" 
                            Vary                     Content-Type 
               "Accept-Encoding"                      "text/html" 
                          status                    statusMessage 
                           "200"                             "OK" 
于 2013-02-11T20:43:00.633 回答
-1

curl -I http://www.google.com

HTTP/1.1 200 OK
Date: Mon, 11 Feb 2013 20:36:06 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=ec3eb1b4b4f31100:FF=0:TM=1360614966:LM=1360614966:S=EjQCjjdv07A6PRtw; expires=Wed, 11-Feb-2015 20:36:06 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=neiRZQ9fctd6NqzdKNdRMzfBqk-yAaxxxruYrnsvTcJeG7q8TJm5Ybv1UZ2ZV_ZheYhy-RwgAppHUh1VhIz4KOcFbcl8-0DvtPYXxaiSQmYvXGEKqeh4glhqvhOdxJKB; expires=Tue, 13-Aug-2013 20:36:06 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked

curl -v http://google.com/

$ curl -v http://google.com/
* About to connect() to google.com port 80 (#0)
*   Trying 66.102.7.104... connected
* Connected to google.com (66.102.7.104) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.16.4 (i386-apple-darwin9.0) libcurl/7.16.4 OpenSSL/0.9.7l zlib/1.2.3
> Host: google.com
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Thu, 15 Jul 2010 06:06:52 GMT
< Expires: Sat, 14 Aug 2010 06:06:52 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 1; mode=block
< 
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host google.com left intact
* Closing connection #0
于 2013-02-11T20:34:46.137 回答