0

我是 R 的新用户,正在尝试创建数据框的多个子样本。我将数据分配给 4 个层(STRATUM = 1、2、3、4),并且希望在每个层中随机保留指定数量的行。为此,我导入数据,按分层值排序,然后为每一行分配一个随机数。我想保留我原来的随机数分配,因为我需要在以后的分析中再次使用它们,所以我用这些值保存了一个 .csv。接下来,我按层对数据进行子集化,然后指定要在每个层中保留的记录数。最后,我重新加入数据并保存为新的 .csv。该代码有效,但是,我想重复此过程 100 次。在每种情况下,我都想保存分配了随机数的 .csv,以及随机选择的图的最终 .csv。我不确定如何让这段代码重复 100 次,以及如何为每次迭代分配一个唯一的文件名。任何帮助将非常感激。

DataFiles <- "//Documents/flownData_JR.csv"
PlotsFlown <- read.table (file = DataFiles, header = TRUE, sep = ",")
#Sort the data by the stratification
FlownStratSort <- PlotsFlown[order(PlotsFlown$STRATUM),]
#Create a new column with a random number (no duplicates)
FlownStratSort$RAND_NUM <- sample(137, size = nrow(FlownStratSort), replace = FALSE)
#Sort by the stratum, then random number
FLOWNRAND <- FlownStratSort[order(FlownStratSort$STRATUM,FlownStratSort$RAND_NUM),]
#Save a csv file with the random numbers
write.table(FLOWNRAND, file = "//Documents/RANDNUM1_JR.csv", sep = ",", row.names = FALSE, col.names = TRUE)
#Subset the data by stratum
FLOWNRAND1 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='1'),]
FLOWNRAND2 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='2'),]
FLOWNRAND3 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='3'),]
FLOWNRAND4 <- FLOWNRAND[which(FLOWNRAND$STRATUM=='4'),]
#Remove data from each stratum, specifying the number of records we want to retain
FLOWNRAND1 <- FLOWNRAND1[1:34, ]
FLOWNRAND2 <- FLOWNRAND2[1:21, ]
FLOWNRAND3 <- FLOWNRAND3[1:7, ]
FLOWNRAND4 <- FLOWNRAND4[1:7, ]
#Rejoin the data
FLOWNRAND_uneven <- rbind(FLOWNRAND1, FLOWNRAND2, FLOWNRAND3, FLOWNRAND4)
#Save the table with plots removed from each stratum flown in 2017
write.table(FLOWNRAND_uneven, file = "//Documents/Flown_RAND_uneven_JR.csv", sep = ",", row.names = FALSE, col.names = TRUE)
4

1 回答 1

0

data.table如果您只需要知道每组中有哪些行,这是一个解决方案。

library(data.table)
df <- data.table(dat = runif(100), 
                 stratum = sample(1:4, 100, replace = T))

# Gets specified number randomly from each strata
get_strata <- function(df, n, i){
  # Subset data frame to randomly chosen w/in strata
  # replace stratum with var name
  f <- df[df[, .I[sample(.N, n)], by = stratum]$V1]

  # Save as CSV, replace path
  write.csv(f, file = paste0("path/df_", i), 
            row.names = F, col.names = T)
}

for (i in 1:100){
  # replace 10 with number needed
  get_strata(df, 10, i)
}
于 2017-06-01T21:40:19.327 回答