让我们定义一个data.frame
3 列 10 行的 df。第三列是类,前两个是一些变量。
var1 <- rnorm(10)
var2 <- rnorm(10,2)
class<- as.factor(c(1,2,3,1,2,1,2,1,3,3))
df <- data.frame(var1=var1,var2=var2,class=class)
如何df
在两个子集中随机分组,以便 sub.df1 和 sub.df2 每个类至少有一个实例?
这有效:
set.seed(123)
partition <- function(x, n = 2) sample(c(1:n, sample(1:n, length(x) - n, TRUE)))
split(df, as.integer(ave(df$class, df$class, FUN = partition)))
# $`1`
# var1 var2 class
# 4 1.6674610 3.3886789 1
# 7 -0.2245588 0.8284845 2
# 8 -1.1481185 4.1586492 1
# 10 -0.4712463 3.1846324 3
#
# $`2`
# var1 var2 class
# 1 0.9884264 3.3487054 1
# 2 -0.1549679 -0.5815586 2
# 3 1.4484692 0.3521933 3
# 5 0.5454097 2.0405363 2
# 6 1.0971626 0.6410492 1
# 9 -1.3042283 3.3235418 3
一种非常不优雅的方式可能是:
#changed the data to better check
var1 <- rnorm(21)
var2 <- rnorm(21,2)
class<- as.factor(c(1,2,3,1,2,1,2,1,3,3,4,1,2,4,4,5,1,2,1,2,5))
DF <- data.frame(var1=var1,var2=var2,class=class)
#order DF by class
DF <- DF[order(DF$class),]
#add a column (like a second class)
#so that every level of the "second class" contains all levels from original class
DF$col4 <- unlist((sapply(table(DF$class),
function(x) { letters[1:x] })), use.names = F)
#order by the "second class"
DF <- DF[order(DF$col4),]
#one df with all levels of original class
DF1 <- split(DF, DF$col4)$a
#another df with all levels of orignal class
DF2 <- split(DF, DF$col4)$b
#remaining levels of second class contain levels
#of original class already present in DF1, DF2
#so, just add them to either DF1 or DF2
DF3 <- do.call(rbind, split(DF, DF$col4)[letters[3:max(table(DF$class))]])
DF1 <- rbind(DF1, DF3[1:(nrow(DF3)/2),])
DF2 <- rbind(DF2, DF3[(nrow(DF3)/2):nrow(DF3),])
#remove the second class
DF1 <- DF1[,1:3]
DF2 <- DF2[,1:3]
#> DF1
# var1 var2 class
#1 -0.32872359 2.0055574 1
#2 -0.93543130 1.9035439 2
#3 -0.04343290 2.9213939 3
#11 1.15724846 3.0646201 4
#16 0.44848508 0.2414504 5
#c.6 0.14438547 3.5265833 1
#c.7 0.31614781 3.7160113 2
#c.10 -0.45882460 -0.2937924 3
#c.15 0.07145533 1.8942732 4
#d.8 0.61422896 1.8690204 1
#> DF2
# var1 var2 class
#4 0.09097849 1.4701793 1
#5 1.19147818 3.4190744 2
#9 -0.37807035 3.4565437 3
#14 -1.35257981 3.6023453 4
#21 -1.07466815 0.2104640 5
#d.8 0.61422896 1.8690204 1
#d.13 -0.26357372 1.7867625 2
#e.12 0.05694470 3.1141126 1
#e.18 -0.51124304 0.8070597 2
#f.17 -2.94353989 1.6532037 1
#f.20 0.21011089 2.4029225 2