r - R中的随机子采样

Question

我是 R 的新手，因此我的问题可能非常简单。我有 40 个拥有大量浮游动物的地点。

我的数据看起来像这样（列是物种丰度，行是站点）

0   0   0   0   0   2   0   0   0   
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   85  0
0   0   0   0   0   45  5   57  0
0   0   0   0   0   13  0   3   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   7   0
0   3   0   0   12  8   0   57  0
0   0   0   0   0   0   0   1   0
0   0   0   0   0   59  0   0   0
0   0   0   0   4   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   105 0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0
0   0   0   0   1   0   0   100 0
0   35  0   55  0   0   0   0   0
1   4   0   0   0   0   0   0   0
0   0   0   0   0   34  21  0   0
0   0   0   0   0   9   17  0   0
0   54  0   0   0   27  5   0   0
0   1   0   0   0   1   0   0   0
0   17  0   0   0   54  3   0   0

我想要的是从每个站点随机抽取一个子样本（例如 50 个个体），无需多次替换（引导程序），以便之后计算新的标准化丰度的多样性指数。

score 1 · Accepted Answer

1

尝试这样的事情：

mysample <- mydata[sample(1:nrow(mydata), 50, replace=FALSE),]

于 2013-08-22T15:45:05.397 回答

score 1 · Accepted Answer

OP 可能在这里寻找的是一种为希尔或辛普森多样性指数引导数据的方法，它提供了一些关于被采样数据的假设：

每一行是一个站点，每一列是一个物种，每个值是一个计数。
个人正在为引导进行抽样，而不是计数。

为此，引导程序通常会将计数建模为一串个体。例如，如果我们有这样的记录：

a  b  c
2  3  4

该记录将被建模为：

aabbbcccc

然后，通常会从字符串中抽取一个样本进行替换，以基于模型集创建一个更大的集合。

引导站点：在 R 中，我们有一种方法可以做到这一点，这实际上是非常简单的“示例”函数。如果从列号中选择，则可以使用计数数据提供概率。

# Test data.
data <- data.frame(a=2, b=3, c=4)

# Sampling from first row of data.
row <- 1
N_samples <- 50

samples <- sample(1:ncol(data), N_samples, rep=TRUE, prob=data[row,])

将样本转换为原始表格的格式：现在我们有一个样本数组，每个项目表示样本所属的列号。我们可以通过多种方式转换回原始表格格式，但这里有一个使用简单计数循环的相当简单的方法：

# Count the number of each entry and store in a list.
for (i in 1:ncol(data)){
    site_sample[[i]] <- sum(samples==i)
}

# Unlist the data to get an array that represents the bootstrap row.
site_sample <- unlist(site_sample)

score 1 · Accepted Answer

刚刚偶然发现这个线程，vegan 包有一个名为“rrarify”的函数，它可以精确地完成你想要做的事情（并且也在相同的生态环境中）

score 0 · Accepted Answer

这应该有效。它比最初看起来要复杂一些，因为每个单元格都包含一个物种的数量。该解决方案使用apply函数将每一行数据发送到用户定义的 sample_species 函数。然后我们生成n 个随机数并对它们进行排序。如果物种 1 有 15 个，物种 2 有 20 个，物种 3 有 20 个，则在 1 和 15 之间生成的随机数表示物种 1，16 和 35 表示物种 2，36-55 表示物种 3。

## Initially takes in a row of the data and the number of samples to take
sample_species <- function(counts,n) {
  num_species <- length(counts)
  total_count <- sum(counts)
  samples <- sample(1:total_count,n,replace=FALSE)
  samples <- samples[order(samples)]
  result <- array(0,num_species)
  total <- 0
  for (i in 1:num_species) {
    result[i] <- length(which(samples > total & samples <= total+counts[i]))
    total <- total+counts[i]
  }
  return(result)
}

A <- matrix(sample(0:100,10*40,replace=T), ncol=10) ## mock data
B <- t(apply(A,1,sample_species,50)) ## results

r - R中的随机子采样

4 回答 4

Related

Reference