1

到目前为止,我一直在 Windows 平台上工作,我获取扩展 ascii 字符的代码是这样的:

extendedascii=rawToChar(as.raw(seq(128,255,by=1)),multiple=TRUE)

这给了我一个带有我需要的字符的向量。

  [1] "€" "" "‚" "ƒ" "„" "…" "†" "‡" "ˆ" "‰" "Š" "‹" "Œ" "" "Ž" "" "" "‘" "’" "“" "”" "•" "–" "—" "˜" "™" "š" "›" "œ" "" "ž" "Ÿ" " " "¡" "¢" "£" "¤" "¥" "¦"
 [40] "§" "¨" "©" "ª" "«" "¬" "­" "®" "¯" "°" "±" "²" "³" "´" "µ" "¶" "·" "¸" "¹" "º" "»" "¼" "½" "¾" "¿" "À" "Á" "Â" "Ã" "Ä" "Å" "Æ" "Ç" "È" "É" "Ê" "Ë" "Ì" "Í"
 [79] "Î" "Ï" "Ð" "Ñ" "Ò" "Ó" "Ô" "Õ" "Ö" "×" "Ø" "Ù" "Ú" "Û" "Ü" "Ý" "Þ" "ß" "à" "á" "â" "ã" "ä" "å" "æ" "ç" "è" "é" "ê" "ë" "ì" "í" "î" "ï" "ð" "ñ" "ò" "ó" "ô"
[118] "õ" "ö" "÷" "ø" "ù" "ú" "û" "ü" "ý" "þ" "ÿ"

现在,在 linux 上,我得到了这个:

  [1] "\x80" "\x81" "\x82" "\x83" "\x84" "\x85" "\x86" "\x87" "\x88" "\x89" "\x8a" "\x8b" "\x8c"
 [14] "\x8d" "\x8e" "\x8f" "\x90" "\x91" "\x92" "\x93" "\x94" "\x95" "\x96" "\x97" "\x98" "\x99"
 [27] "\x9a" "\x9b" "\x9c" "\x9d" "\x9e" "\x9f" "\xa0" "\xa1" "\xa2" "\xa3" "\xa4" "\xa5" "\xa6"
 [40] "\xa7" "\xa8" "\xa9" "\xaa" "\xab" "\xac" "\xad" "\xae" "\xaf" "\xb0" "\xb1" "\xb2" "\xb3"
 [53] "\xb4" "\xb5" "\xb6" "\xb7" "\xb8" "\xb9" "\xba" "\xbb" "\xbc" "\xbd" "\xbe" "\xbf" "\xc0"
 [66] "\xc1" "\xc2" "\xc3" "\xc4" "\xc5" "\xc6" "\xc7" "\xc8" "\xc9" "\xca" "\xcb" "\xcc" "\xcd"
 [79] "\xce" "\xcf" "\xd0" "\xd1" "\xd2" "\xd3" "\xd4" "\xd5" "\xd6" "\xd7" "\xd8" "\xd9" "\xda"
 [92] "\xdb" "\xdc" "\xdd" "\xde" "\xdf" "\xe0" "\xe1" "\xe2" "\xe3" "\xe4" "\xe5" "\xe6" "\xe7"
[105] "\xe8" "\xe9" "\xea" "\xeb" "\xec" "\xed" "\xee" "\xef" "\xf0" "\xf1" "\xf2" "\xf3" "\xf4"
[118] "\xf5" "\xf6" "\xf7" "\xf8" "\xf9" "\xfa" "\xfb" "\xfc" "\xfd" "\xfe" "\xff"

我尝试Encoding(extensesascii)并获得"Unknown"了向量的所有元素。我也尝试过iconv(extendedascii, from="UTF-8", to="ASCII")并最终使用了 NA。

我相信我的基本问题是我不知道我的文本采用什么编码,而且我的机器可能不知道/识别它。有什么帮助吗?

4

1 回答 1

3

没有扩展 ASCII 这样的东西。您在 Windows 上的编码称为Windows-1252或 CP-1252。iconv很清楚。

如果你有很多文件采用这种编码,你可能需要iconv在 Linux 上继续使用;否则,一劳永逸地切换到 UTF-8 是有意义的。

于 2013-01-24T14:25:53.493 回答