我正在使用 getURL() 和 htmlParse() - 如何使带有特殊字符的网站内容正确显示?
library(RCurl); library(XML)
script <- getURL("http://www.floraweb.de/pflanzenarten/foto.xsql?suchnr=814")
doc <- htmlParse(script, encoding = "UTF-8")
xpathSApply(doc, "//div[@id='content']//p", xmlValue)[2]
[1] "Bellis perennis L., Gänseblümchen"
# should say:
[1] "Bellis perennis L., Gänseblümchen"
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252 LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
[5] LC_TIME=German_Austria.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] jpeg_0.1-2 RCurl_1.9-5.1 bitops_1.0-4.1 XML_3.9-1.1
loaded via a namespace (and not attached):
[1] tools_2.14.1
options("encoding")
$encoding
[1] "native.enc"