0

所以,我有一个包含大量变量的 DataFrame,我想用 t 检验交叉检查每个变量与其他变量。

我的数据样本,称为trust_news

意思是 政体2 网络 无国界医生 公民权利 表达自由 vdem_gov_censorship_effort vdem_self_censorship_effort vdem_freedom_of_expression ciri_freedom_of_speech_and_press 媒体完整性 vdem_critical_press vdem_media_perspective vdem_media_bias vdem_media_corruption vdem_media_freedom
1 2.68 8 87.2661 25.69 0.785599008 0.758906967 0.731895466 0.742219428 1 1 0.81449235 0.889046047 0.782079459 0.693825991 0.733503755 1
2 2.8 8 94.8967 22.23 0.810742702 0.832891911 0.8447733 0.831499528 1 1 0.88417386 0.868772592 0.881994928 0.835622928 0.828566864 1
3 3.22 10 89.7391 14.6 0.821268417 0.83327835 0.883343829 0.805721471 1 1 0.829951651 0.917491749 0.725950972 0.709774199 0.874261064 1
5 2.96 10 74.3872 24.98 0.813949794 0.781986225 0.844615869 0.729330399 0.666666667 0.5 0.878769429 0.872387239 0.919019442 0.841939049 0.810193322 0.5

然后,我在上面运行这段代码:

trust_news_combos <- combn(trust_news, 1, t.test, simplify = TRUE)

首先,代码是否正确?我不知道mcombn()函数中放什么。AAnyway,那条线给了我这个:

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1 c(t = 85.1670166474227) c(t = 15.9614095646055) c(t = 29.2365516170159) c(t = 11.0778062107689) c(t = 30.4673329981756) c(t = 26.8521522144486) c(t = 23.160185720972) c(t = 25.1063414199952) c(t = 17.1830959329723) c(t = 11.06502519693) c(t = 33.0841916129404) c(t = 29.3707961673045) c(t = 31.2455551028106) c(t = 39.1490231250879) c(t = 27.6089179039943) c(t = 14.0719508946058)
2 c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32)
3 2.69E-39 8.55E-17 1.18E-24 1.75E-12 3.29E-25 1.61E-23 1.46E-21 1.26E-22 1.03E-17 1.80E-12 2.55E-26 1.02E-24 1.51E-25 1.32E-28 6.88E-24 2.96E-15
4 c(3.00189912275063 3.14900996815846) c(7.56066019283154 9.77267314050179) c(73.5097801046279 84.5198259559781) c(19.628297122971 28.4729149982411) c(0.682586494865725 0.780396107679729) c(0.639468676034051 0.744449016935646) c(0.664192511270674 0.792289818305084) c(0.665160025455844 0.782621785210823) c(0.676674167771883 0.858679367682662) c(0.543941635486123 0.78939169784721) c(0.739756992152986 0.836824222392469) c(0.730937293702635 0.839876930600395) c(0.729509614919607 0.831257822777363) c(0.709894349786553 0.787820841122538) c(0.708427672557418 0.821287114048642) c(0.647915673315896 0.867235841835619)
5 c( mean of x= 3.07545454545455) c( mean of x=8.66666666666667) c( mean of x=79.014803030303) c( mean of x=24.0506060606061) c( mean of x= 0.731491301272727) c( mean of x= 0.691958846484849) c( mean of x= 0.728241164787879) c( mean of x=0.723890905333333) c( mean of x= 0.767676767727273) c( mean of x= 0.666666666666667) c( mean of x=0.788290607272727) c( mean of x= 0.785407112151515) c( mean of x=0.780383718848485) c( mean of x= 0.748857595454545) c( mean of x= 0.76485739330303) c( mean of x=0.757575757575758)
6 c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0) c(平均值 = 0)
7 0.036110864 0.542976272 2.702603374 2.171062176 0.024009036 0.025769214 0.031443667 0.028832991 0.044676278 0.0602499 0.023826806 0.02674109 0.024975831 0.019128385 0.027703273 0.053835873
8 双面 双面 双面 双面 双面 双面 双面 双面 双面 双面 双面 双面 双面 双面 双面 双面
9 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验 一个样本 t 检验
10 x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a]

它为我提供了我在第 3 行中寻找的 p 值,但我如何检查正在检查哪两列?

任何帮助表示赞赏,并将在我的最终代码中表示感谢!

4

2 回答 2

1

一种方法是combn在列名上创建第二个

nm1 <-  combn(names(trust_news), 2, FUN = paste, collapse= '-', simplify = TRUE)

然后,我们做

trust_news_combos <- combn(trust_news, 2, t.test, simplify = FALSE)
names(trust_new_combos) <- nm1

broom 使用tidy(run in R 4.1.0)在 data.frame/tibble 结构中获取输出也可能更好

library(broom)
lst1 <- combn(trust_news, 2, \(y) t.test(y[1], y[2]) |>
                             tidy(), simplify = FALSE) |>
         setNames(nm1)

out <- Map(cbind, comparison = names(lst1), lst1) |>
       {\(x) do.call(rbind, x)}()
row.names(out) <- NULL

         

-输出

head(out)
                 comparison   estimate estimate1  estimate2   statistic      p.value parameter    conf.low
1                  row-mean  -0.165000      2.75  2.9150000  -0.1914478 0.8599889461  3.112075  -2.8527609
2               row-polity2  -6.250000      2.75  9.0000000  -6.0633906 0.0014638846  5.268737  -8.8595564
3                   row-web -83.822275      2.75 86.5722750 -18.8602012 0.0002049939  3.229641 -97.4140679
4                   row-rsf -19.125000      2.75 21.8750000  -7.1441517 0.0027953086  3.671029 -26.8277783
5       row-civil_liberties   1.942110      2.75  0.8078900   2.2742727 0.1074862571  3.000494  -0.7752796
6 row-freedom_of_expression   1.948234      2.75  0.8017659   2.2809921 0.1067532047  3.002873  -0.7684766
   conf.high                  method alternative
1   2.522761 Welch Two Sample t-test   two.sided
2  -3.640444 Welch Two Sample t-test   two.sided
3 -70.230482 Welch Two Sample t-test   two.sided
4 -11.422222 Welch Two Sample t-test   two.sided
5   4.659500 Welch Two Sample t-test   two.sided
6   4.664945 Welch Two Sample t-test   two.sided

数据

trust_news <- structure(list(row = c(1L, 2L, 3L, 5L), mean = c(2.68, 2.8, 3.22, 
2.96), polity2 = c(8L, 8L, 10L, 10L), web = c(87.2661, 94.8967, 
89.7391, 74.3872), rsf = c(25.69, 22.23, 14.6, 24.98), civil_liberties = c(0.785599008, 
0.810742702, 0.821268417, 0.813949794), freedom_of_expression = c(0.758906967, 
0.832891911, 0.83327835, 0.781986225), vdem_gov_censorship_effort = c(0.731895466, 
0.8447733, 0.883343829, 0.844615869), vdem_self_censorship_effort = c(0.742219428, 
0.831499528, 0.805721471, 0.729330399), vdem_freedom_of_expression = c(1, 
1, 1, 0.666666667), ciri_freedom_of_speech_and_press = c(1, 1, 
1, 0.5), media_integrity = c(0.81449235, 0.88417386, 0.829951651, 
0.878769429), vdem_critical_press = c(0.889046047, 0.868772592, 
0.917491749, 0.872387239), vdem_media_perspective = c(0.782079459, 
0.881994928, 0.725950972, 0.919019442), vdem_media_bias = c(0.693825991, 
0.835622928, 0.709774199, 0.841939049), vdem_media_corruption = c(0.733503755, 
0.828566864, 0.874261064, 0.810193322), vdem_media_freedom = c(1, 
1, 1, 0.5)), class = "data.frame", row.names = c(NA, -4L))
于 2021-05-22T19:07:52.630 回答
0

您应该编写一个小函数来准确计算您需要的内容,并使用它而不是标准函数t.test。例如:

# get four column names
cols <- names(mtcars)[1:4]   # use trust_news instead of mtcars, and keep all the names

# compute the pval for a pair of names
pval <- function(pair) {
  value <- t.test(mtcars[, pair[1]], mtcars[, pair[2]])$p.value
  names(value) <- paste(pair, collapse = " vs. ")
  value
}

# do it for all pairs.  Don't simplify, and it will keep the names
combn(cols, 2, pval, simplify = FALSE)
#> [[1]]
#>  mpg vs. cyl 
#> 9.507708e-15 
#> 
#> [[2]]
#> mpg vs. disp 
#> 7.978234e-11 
#> 
#> [[3]]
#>   mpg vs. hp 
#> 1.030354e-11 
#> 
#> [[4]]
#> cyl vs. disp 
#> 1.774454e-11 
#> 
#> [[5]]
#>   cyl vs. hp 
#> 8.321996e-13 
#> 
#> [[6]]
#> disp vs. hp 
#> 0.001545647

reprex 包于 2021-05-22 创建 (v2.0.0 )

于 2021-05-22T19:06:12.833 回答