我在 Windows 7 32 位上使用 R 3.1.1。我在阅读一些我想对其执行文本分析的文本文件时遇到很多问题。根据 Notepad++,这些文件是用"UCS-2 Little Endian"编码的。(grepWin,一个名字说明一切的工具,说文件是“Unicode”。)
问题是即使指定该编码,我似乎也无法读取文件。(这些字符是标准的西班牙拉丁文集 -ñáó- 并且应该使用 CP1252 或类似的东西轻松处理。)
> Sys.getlocale()
[1] "LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETARY=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252"
> readLines("filename.txt")
[1] "ÿþE" "" "" "" "" ...
> readLines("filename.txt",encoding="UTF-8")
[1] "\xff\xfeE" "" "" "" "" ...
> readLines("filename.txt",encoding="UCS2LE")
[1] "ÿþE" "" "" "" "" "" "" ...
> readLines("filename.txt",encoding="UCS2")
[1] "ÿþE" "" "" "" "" ...
有任何想法吗?
谢谢!!
编辑:“UTF-16”、“UTF-16LE”和“UTF-16BE”编码同样失败