r - R将大量汇总表中的变量存储在新文档的列中

Question

我有一个包含 6 列的数据框 (X)，分别命名为：mean.x、sx、nx、mean.y、sy、ny 它们代表来自人口 x 和 y 的均值、st dev (s) 和样本大小 (n)。我正在运行一个 R 包（BSDA），它根据这些统计参数执行 t 检验。问题是每行我得到 1 个汇总表，我有 640.000 行。

我想做的是使用 640.000 个汇总表中的所有 p 值和其他参数创建新列。这可能吗？

前 5 行的值相同：mean.x (0.444357)、sx (0.02575427)、nx (633744)、mean.y (0.4308)、sy (0.000628747)、ny (390)

这是显示汇总表的脚本：

library(BSDA)

tsum.test(mean.x = X$mean.x,
          s.x = X$s.x,
          n.x = X$n.x,
          mean.y = X$mean.y,
          s.y = X$s.y,
          n.y = X$n.y, 
          alternative = "less",
          mu = 0, # null hypothesis that there is no diff between means
          var.equal = FALSE,
          conf.level = 0.95)

非常感谢！

score 1 · Accepted Answer

有可能是的。看看下面。一种方法是为此使用apply：

想象一个非常简单data.frame的例子（这个简单示例的所有行都是相同的）：

x  <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8) 
y  <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5) 
X <- data.frame(mean_x = mean(x), s.x = sd(x), n.x = 11, mean_y = mean(y), s.y = sd(y), 
                n.y = 8) 
X <- rbind(X, X, X)

#> X
#    mean_x       s.x n.x mean_y       s.y n.y
#1 7.018182 0.4643666  11 5.2625 0.7069805   8
#2 7.018182 0.4643666  11 5.2625 0.7069805   8
#3 7.018182 0.4643666  11 5.2625 0.7069805   8

然后你使用 an在每一行上apply运行你tsum.test的并提取你需要的参数。对于我提取的示例p.values和degrees of freedom：

new_cols <-
apply(X, 1, function(x) {

  #using apply in each iteration, a row will be fed to the tsum.test function
  #so make sure you re using the correct ones
  stats <- 
    #x[1] corresponds to the first column, x[2] to the second and so on
    tsum.test(mean.x = x[1],
          s.x = x[2],
          n.x = x[3],
          mean.y = x[4],
          s.y = x[5],
          n.y = x[6], 
          alternative = "less",
          mu = 0, # null hypothesis that there is no diff between means
          var.equal = FALSE,
          conf.level = 0.95)

  #output p.values and degrees of freedom on this occasion
  c(pvalue = stats$p.value, df = stats$parameters)

})

以上输出自由度和 p.values，为了绑定到您的 data.frame，您可以执行以下操作：

   > cbind(X, t(new_cols))
    mean_x       s.x n.x mean_y       s.y n.y pvalue.mean_x    df.df
1 7.018182 0.4643666  11 5.2625 0.7069805   8     0.9999669 11.30292
2 7.018182 0.4643666  11 5.2625 0.7069805   8     0.9999669 11.30292
3 7.018182 0.4643666  11 5.2625 0.7069805   8     0.9999669 11.30292

r - R将大量汇总表中的变量存储在新文档的列中

1 回答 1

Related

Reference