45

使用函数sort(x)时,其中x是一个字符,字母“y”跳到中间,紧跟在字母“i”之后:

> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t"
[21] "u" "v" "w" "x" "y" "z"

> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[21] "t" "u" "v" "w" "x" "z"

原因可能是我位于立陶宛,这是对字母的“立陶宛式”排序,但我需要正常排序。如何在 R 代码中将排序方法改回正常?

我在 Win7 上使用 R 2.15.2。

4

2 回答 2

39

您需要更改 R 正在运行的语言环境。为您的整个 Windows 安装(这似乎不是最佳的)或在 R 会话中通过以下方式执行此操作:

Sys.setlocale("LC_COLLATE", "C")

您可以使用任何其他有效的语言环境字符串代替"C"那里,但这应该会让您回到letters您想要的排序顺序。

阅读?locales更多。

我想值得一提的是姊妹函数Sys.getlocale(),它查询语言环境参数的当前设置。因此你可以做

(locCol <- Sys.getlocale("LC_COLLATE"))
Sys.setlocale("LC_COLLATE", "lt_LT")
sort(letters)
Sys.setlocale("LC_COLLATE", locCol)
sort(letters)
Sys.getlocale("LC_COLLATE")

## giving:
> (locCol <- Sys.getlocale("LC_COLLATE"))
[1] "en_GB.UTF-8"
> Sys.setlocale("LC_COLLATE", "lt_LT")
[1] "lt_LT"
> sort(letters)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n"
[16] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "z"
> Sys.setlocale("LC_COLLATE", locCol)
[1] "en_GB.UTF-8"
> sort(letters)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
[16] "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
> Sys.getlocale("LC_COLLATE")
[1] "en_GB.UTF-8"

这当然是@Hadley's Answer在安装devtoolswith_collate()后显示的更简洁的做法。

于 2013-01-22T12:21:45.563 回答
34

如果您想暂时这样做,请devtools提供以下with_collate功能:

library(devtools)
with_collate("C", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
# [20] "t" "u" "v" "w" "x" "y" "z"
with_collate("lt_LT", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r"
# [20] "s" "t" "u" "v" "w" "x" "z"
于 2013-01-22T13:26:36.993 回答