到目前为止,我发现使用生成排列iterpc
是最快的方法。一个示例用法可能是:
library(iterpc)
set.seed(143)
dat <- sample(LETTERS[1:4], 10, replace = TRUE)
np_multiset(table(dat), length(dat))
# [1] 18900
I <- iterpc(table(dat), order=TRUE)
out <- getall(I)
getnext(I)
# [1] A A A A B B C C D D
# Levels: A B C D
getcurrent(I)
# [1] A A A A B B C C D D
# Levels: A B C D
生成的矩阵将是 18900 x 10,它很大,可以存储在单个矩阵中。在 的帮助下getnext(I, 1000)
,我可以得到 1000 块的排列并以此为基础工作。然而,所有这些排列都是用标签排序的。有没有办法以随机顺序而不是按顺序从 18900 的集合中采样 1000?
预期输出:(但是,没有生成所有排列out
)
Isam <- sample(18900, 10)
# [1] 15746 18026 17881 18687 7513 1975 5575 2845 1275 10207
out[Isam,]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "B" "A" "A" "A" "D" "C" "C" "B" "A" "D"
# [2,] "B" "D" "A" "A" "C" "D" "A" "C" "B" "A"
# [3,] "B" "A" "A" "B" "C" "A" "A" "D" "C" "D"
# [4,] "A" "C" "A" "C" "D" "B" "A" "B" "A" "D"
# [5,] "C" "D" "A" "A" "A" "C" "B" "B" "D" "A"
# [6,] "A" "B" "A" "D" "A" "D" "A" "B" "C" "C"
# [7,] "B" "A" "A" "D" "B" "C" "C" "A" "A" "D"
# [8,] "A" "A" "D" "C" "B" "D" "A" "A" "C" "B"
# [9,] "D" "C" "A" "C" "D" "B" "A" "B" "A" "A"
# [10,] "C" "D" "D" "A" "A" "A" "C" "B" "B" "A"