r - 在 R 中对数据进行排序以及如何提取值？

Question

这是我的数据集，我是 R 新手，我正在尝试为此数据集编写脚本。

R> head(KenTau)
  Age CapReg TrSw FeelChk CanSw
1  20      1    0       0     0
2  36      1    0       0     0
3  35      1    3       2     2
4  21      0    0       2     2
5  43      0    0       2     2
6  34      1    0       0     0

我想将 TrSw 与 colmn 变量的其余部分进行比较，即

TrSw Vs Age
TrSw Vs CapReg  
TrSw Vs FeelChk 
TrSw Vs CanSw

我用这个在 R 上运行它，我用这个命令

cor.test(KenTau$Age, KenTau$TrSw, alternative="two.sided", method="kendall")

我还想提取 Age 和 pvalue 以便我可以有一个列表，因为我有接近 50 个变量。

dput()数据的：

KenTau <- structure(list(Age = c(20L, 36L, 35L, 21L, 43L, 34L, 37L, 62L, 
54L, 47L, 48L, 45L, 2L, 2L, 2L, 54L, 52L, 40L, 58L, 29L, 27L, 
28L, 46L, 35L, 50L, 31L, 48L, 2L, 29L, 54L, 52L, 28L, 28L, 26L, 
38L, 59L, 51L, 58L, 39L, 44L, 53L, 2L, 39L, 55L, 48L, 2L, 23L, 
51L, 50L, 26L, 28L, 40L, 38L, 61L, 52L, 33L, 2L, 59L, 27L, 45L, 
45L, 57L, 66L, 52L, 58L, 34L, 28L, 39L, 48L, 53L, 39L, 46L, 57L, 
36L, 25L, 22L, 29L, 46L, 25L, 25L, 35L, 44L, 24L, 26L, 33L, 27L, 
41L, 28L, 26L, 32L, 36L, 35L, 32L, 33L, 29L, 29L, 52L, 55L, 23L, 
29L, 45L, 26L, 48L, 54L, 50L, 35L, 27L, 39L, 41L, 30L, 30L, 31L, 
27L, 28L, 27L, 25L, 34L, 23L, 30L, 34L, 52L, 20L, 31L, 2L, 45L, 
34L, 21L, 60L, 34L, 40L, 47L, 30L, 54L, 36L, 32L, 31L, 55L, 57L, 
23L, 31L, 26L, 26L, 27L, 19L, 26L, 25L, 37L, 47L, 38L, 38L, 26L, 
25L, 41L), CapReg = c(1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 
0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 
1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 
0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 
1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 
1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 
1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 
1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
    TrSw = c(0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
    1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 
    0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 
    1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 
    1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 
    1L, 1L, 0L, 3L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 
    1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 
    1L, 1L, 0L, 1L, 1L, 1L), FeelChk = c(0L, 0L, 2L, 2L, 2L, 
    0L, 2L, 2L, 2L, 3L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 2L, 0L, 1L, 
    0L, 1L, 2L, 2L, 1L, 1L, 0L, 2L, 2L, 1L, 2L, 2L, 0L, 1L, 2L, 
    0L, 1L, 2L, 2L, 3L, 0L, 2L, 1L, 0L, 0L, 2L, 1L, 2L, 2L, 1L, 
    1L, 0L, 1L, 2L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 3L, 1L, 2L, 1L, 
    1L, 0L, 0L, 1L, 1L, 1L, 0L, 2L, 3L, 1L, 2L, 2L, 1L, 1L, 0L, 
    2L, 1L, 0L, 1L, 1L, 0L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 1L, 2L, 
    1L, 0L, 0L, 0L, 0L, 2L, 0L, 1L, 0L, 2L, 2L, 2L, 0L, 0L, 2L, 
    3L, 2L, 0L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 1L, 2L, 2L, 
    1L, 1L, 2L, 0L, 3L, 1L, 0L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 0L, 
    0L, 2L, 0L, 2L, 2L, 3L, 0L, 1L, 1L, 2L, 0L, 0L, 0L), CanSw = c(0L, 
    0L, 2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 
    0L, 2L, 2L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 2L, 2L, 2L, 3L, 2L, 
    2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 1L, 1L, 2L, 0L, 0L, 2L, 2L, 
    3L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 0L, 1L, 0L, 2L, 1L, 3L, 1L, 
    0L, 0L, 2L, 0L, 0L, 0L, 2L, 0L, 1L, 1L, 1L, 2L, 0L, 1L, 2L, 
    2L, 1L, 1L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 
    0L, 2L, 1L, 2L, 0L, 2L, 2L, 0L, 1L, 2L, 0L, 1L, 0L, 2L, 2L, 
    2L, 0L, 0L, 2L, 3L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 
    0L, 0L, 2L, 2L, 1L, 1L, 2L, 1L, 0L, 0L, 2L, 0L, 1L, 2L, 2L, 
    1L, 1L, 0L, 0L, 2L, 2L, 0L, 2L, 2L, 3L, 1L, 1L, 0L, 2L, 0L, 
    2L, 0L)), .Names = c("Age", "CapReg", "TrSw", "FeelChk", 
"CanSw"), class = "data.frame", row.names = c(NA, -153L))

score 3 · Accepted Answer

虽然我不相信为 50 个相关性生成 p 值的统计优点，但这很容易lapply()和朋友一起做。

为此，我选择迭代那些不是的索引names，因为这是您希望与所有其他变量进行比较的变量。我首先使用以下方法获取这些索引：KenTau"TrSw"which()

R> inds <- which(names(KenTau) != "TrSw")
R> inds
[1] 1 2 4 5

接下来，我设置了对的调用lapply()，我将在其中进行迭代inds。我现在需要一个匿名函数，它将索引ind作为第一个参数（这lapply()将在每次迭代时传递我的函数），并且我需要传入数据，我将其作为参数传递x。cor.test()如您在示例中所示，我的匿名函数调用，但请注意如何x[, ind]用于引用我们与之关联的当前索引或列TrSw。lapply()调用的最后一部分表示x将数据作为传递，KenTau以便每当您x在匿名函数中看到时，这实际上是指以下内容的副本KenTau：

cors <- lapply(inds,
               function(ind, x) {
                   cor.test(x[, ind], x[, "TrSw"], alternative="two.sided",
                            method="kendall")
               }, x = KenTau)

将一些名称添加到列表中cors会有所帮助，现在就这样做：

names(cors) <- names(KenTau)[inds]

如果我们看一下，cors我们会发现它是一个列表：

R> str(cors, max = 1)
List of 4
 $ Age    :List of 8
  ..- attr(*, "class")= chr "htest"
 $ CapReg :List of 8
  ..- attr(*, "class")= chr "htest"
 $ FeelChk:List of 8
  ..- attr(*, "class")= chr "htest"
 $ CanSw  :List of 8
  ..- attr(*, "class")= chr "htest"

列表的每个元素都是 class 的对象"htest"，这是cor.test()返回的内容。有四个这样的对象，因为有四个变量要与之比较TrSw。

您希望提取p值，因此我们需要查看它在"htest"对象中的存储位置：

R> str(cors[[1]])
List of 8
 $ statistic  : Named num 1.57
  ..- attr(*, "names")= chr "z"
 $ parameter  : NULL
 $ p.value    : num 0.116
 $ estimate   : Named num 0.105
  ..- attr(*, "names")= chr "tau"
 $ null.value : Named num 0
  ..- attr(*, "names")= chr "tau"
 $ alternative: chr "two.sided"
 $ method     : chr "Kendall's rank correlation tau"
 $ data.name  : chr "x[, ind] and x[, \"TrSw\"]"
 - attr(*, "class")= chr "htest"

上面的输出显示p值存储在 component 中p.value。为了提取所有 4 个p值，我们实际上想要这样做：

 res[[i]][["p.value"]]

在哪里i是cors轮流的每个元素。为此，我们可以lapply()再次使用，但sapply()会将结果简化为我们的向量，在这种情况下更整洁。sapply()每次调用都会通过我们，res[[i]]所以我们只需要应用[[函数（是的，它可能看起来不像一个，但它非常像一个函数；"[["()）。该函数接受一个参数（在这种情况下，我们可以使用要提取的组件的名称），我将其传递为"p.value"：

res <- sapply(cors, `[[`, "p.value")

因为我添加names到cors，sapply()将返回一个命名向量，其中包含命名变量 an 之间相关性的pTrSw值：

R> res
         Age       CapReg      FeelChk        CanSw 
1.157889e-01 3.920115e-01 2.189736e-04 1.578040e-06

如果您想要结果的另一个组成部分，请说出测试统计数据本身，然后替换"p.value"为您想要的组成部分的名称，例如"statistic"获取 Kendall 的 Tau。

如果您要对很多变量执行此操作，请阅读多个测试并调整p值，因为我不相信您的结果会像 50 个相关性一样有用。

r - 在 R 中对数据进行排序以及如何提取值？

1 回答 1

Related

Reference