我不明白你的假数据,所以我会自己做。
我假设您构建自己独特的组。我刚刚使用了这些数字1:2000
,但您可以在任何组类型上运行此代码..
# let's make some fake data with 155k points distributed in 2k groups
x <-
data.frame(
groupname = sample( x = 1:2000 , size = 155000 , replace = TRUE ) ,
anothercol = 1 ,
andanothercol = "hi"
)
# look at your data frame `x`
head( x )
# so long as you've constructed a `groupname` variable in your data, it's easy
# calculate the proportion of each group in the total
groupwise.prob <- table( x$groupname ) / nrow( x )
# store that into a probability vector
# convert this to a data frame
prob.frame <- data.frame( groupwise.prob )
head( prob.frame )
# rename the `Var1` column to match your group name variable on `x`
names( prob.frame )[ 1 ] <- 'groupname'
# rename the `Freq` column to say what it is on `x`
names( prob.frame )[ 2 ] <- 'prob'
# merge these individual probabilities back onto your data frame
x <- merge( x , prob.frame , all.x = TRUE )
# now just use the sample function's prob= parameter off of that
# and scale down the size to what you want
recs.to.samp <-
sample(
1:nrow( x ) ,
size = 6000 ,
replace = FALSE ,
prob = x$prob
)
# and now here's your new sample, with proportions in tact
y <- x[ recs.to.samp , ]
head( y )