r - “采样”中的 strata() 返回错误：参数暗示不同的行数

Question

我有一个看起来像这样的数据框：

'data.frame':   1090 obs. of  8 variables:
 $ id            : chr  "INC000000209241" "INC000000218488" "INC000000218982" "INC000000225646" ...
 $ service.type  : chr  "Incident" "Incident" "Incident" "Incident" ...
 $ priority      : chr  "Critical" "Critical" "Critical" "Critical" ...

我将数据排序如下：

data <- data[order(data$priority),]

我一直在将优先级更改为因素等，但无论我尝试什么，当我尝试运行以下命令时：

s = strata(data,c("priority"),size=c(0,0,1,5))

我总是收到以下错误：

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 0, 1

我尝试调试该函数，看看我是否能说出为什么会引发此错误（但我无法理解代码）。在执行 strata() 函数的这个阶段引发了错误：

debug: r = cbind(r, i)

非常感谢您的帮助！

score 5 · Accepted Answer

问题在于您试图将某些组的样本量设置为零。相反，在采样之前对原始数据进行子集化。

在这里，我们重现您的问题。

library(sampling)
data(swissmunicipalities)
length(table(swissmunicipalities$REG)) # We have seven strata
# [1] 7

# Let's take two from each group
strata(swissmunicipalities, 
       stratanames = c("REG"), 
       size = rep(2, 7), 
       method="srswor")
#      REG ID_unit        Prob Stratum
# 93     4      93 0.011695906       1
# 145    4     145 0.011695906       1
# 2574   1    2574 0.003395586       2
# 2631   1    2631 0.003395586       2
# 826    3     826 0.006230530       3
# 1614   3    1614 0.006230530       3
# 583    2     583 0.002190581       4
# 1017   2    1017 0.002190581       4
# 1297   5    1297 0.004246285       5
# 2535   5    2535 0.004246285       5
# 342    6     342 0.010752688       6
# 347    6     347 0.010752688       6
# 651    7     651 0.008163265       7
# 2471   7    2471 0.008163265       7

# Let's try to drop the first two groups. Oops...
strata(swissmunicipalities, 
       stratanames = c("REG"), 
       size = c(0, 0, 2, 2, 2, 2, 2), 
       method="srswor")
# Error in data.frame(..., check.names = FALSE) : 
#   arguments imply differing number of rows: 0, 1

让我们进行子集化并再试一次。

swiss2 <- swissmunicipalities[!swissmunicipalities$REG %in% c(1, 2), ]
table(swiss2$REG)
strata(swiss2, 
       stratanames = c("REG"), 
       size = c(2, 2, 2, 2, 2), 
       method="srswor")
#      REG ID_unit        Prob Stratum
# 58     4      58 0.011695906       1
# 115    4     115 0.011695906       1
# 432    3     432 0.006230530       2
# 986    3     986 0.006230530       2
# 1007   5    1007 0.004246285       3
# 1150   5    1150 0.004246285       3
# 190    6     190 0.010752688       4
# 497    6     497 0.010752688       4
# 1049   7    1049 0.008163265       5
# 1327   7    1327 0.008163265       5

r - “采样”中的 strata() 返回错误：参数暗示不同的行数

1 回答 1

Related

Reference