1

我正在使用 getURL() 和 htmlParse() - 如何使带有特殊字符的网站内容正确显示?

        library(RCurl); library(XML)
        script <- getURL("http://www.floraweb.de/pflanzenarten/foto.xsql?suchnr=814")
        doc <- htmlParse(script, encoding = "UTF-8")
        xpathSApply(doc, "//div[@id='content']//p", xmlValue)[2]
        [1] "Bellis perennis L., Gänseblümchen"
        # should say:
        [1] "Bellis perennis L., Gänseblümchen"

        > sessionInfo()
    R version 2.14.1 (2011-12-22)
    Platform: i386-pc-mingw32/i386 (32-bit)

    locale:
    [1] LC_COLLATE=German_Austria.1252  LC_CTYPE=German_Austria.1252    LC_MONETARY=German_Austria.1252 LC_NUMERIC=C                   
    [5] LC_TIME=German_Austria.1252    

    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     

    other attached packages:
    [1] jpeg_0.1-2     RCurl_1.9-5.1  bitops_1.0-4.1 XML_3.9-1.1   

    loaded via a namespace (and not attached):
    [1] tools_2.14.1

options("encoding")
$encoding
[1] "native.enc"
4

0 回答 0