3

I am trying to request an XML document with two different methods (xmlParse and httr::GET) and expect the response to be the same. The response I get with xmlParse is what I expect but with httr::GET my request URL gets truncated at some point.

An example:

require(httr)
require(XML)
require(rvest)

term <- "alopecia areata"
request <- paste0("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=",term)  

#requesting URL with XML
xml_response <- xmlParse(request)

xml_response %>%
        xml_nodes(xpath = "//Result/Term") %>%
        xml_text 

This returns, as it should

[1] "alopecia areata"        

Now for httr

httr_response <- GET(request)
httr_content <- content(httr_response)

httr_content %>%
        xml_nodes(xpath = "//Result/Term") %>%
        xml_text 

This returns

[1] "alopecia"

What's interesting: if we check the httr_response element for the requested URL, it's correct. Only the response is wrong.

> httr_response$request$opts$url

[1] "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=alopecia areata"

> httr_response$url

[1] "http://eutils.ncbi.nlm.nih.gov/gquery?term=alopecia&retmode=xml"

So at some point my query term got truncated. If the whole request is put into a browser by hand, it behaves as expected.

Any suggestions how to resolve this would be would be greatly appreciated.

4

1 回答 1

5

您可以尝试将 URL 中的空格替换为 a+以防止其被截断:

httr_response <- GET(gsub(" ","+",request))
httr_content <- content(httr_response)

httr_content %>%
        xml_nodes(xpath = "//Result/Term") %>%
        xml_text 

#[1] "alopecia areata"

有关空间和 URL 的更多信息在这里

于 2015-02-28T11:58:34.033 回答