0

我有一个包含多个组(2 到 6 个级别的因子)和二分变量(0、1)的庞大数据集。

示例数据

DF <- data.frame(
group1 = sample(x = c("A","B","C","D"), size  = 100, replace = T),
group2 = sample(x = c("red","blue","green"), size  = 100, replace = T),
group3 = sample(x = c("tiny","small","big","huge"), size  = 100, replace = T),
var1 = sample(x = 0:1, size  = 100, replace = T),
var2 = sample(x = 0:1, size  = 100, replace = T),
var3 = sample(x = 0:1, size  = 100, replace = T),
var4 = sample(x = 0:1, size  = 100, replace = T),
var5 = sample(x = 0:1, size  = 100, replace = T))

我想为所有变量的每个组做一个卡方。

library(tidyverse)
library(rstatix)

chisq_test(DF$group1, DF$var1)
chisq_test(DF$group1, DF$var2)
chisq_test(DF$group1, DF$var3)
...
etc

我设法通过使用两个嵌套的 for 循环使其工作,但我确信有更好的解决方案

groups <- c("group1","group2","group3")
vars <- c("var1","var2","var3","var4","var5")

results <- data.frame()
for(i in groups){
  for(j in vars){
    test <- chisq_test(DF[,i], DF[,j])
    test <- mutate(test, group=i, var=j)
    results <- rbind(results, test)
  }
}
results

我想我需要某种应用功能,但我想不通

4

2 回答 2

0

这是一个快速简便的 dplyr 解决方案,它涉及将数据转换为由 group 和 var 键入的长格式,然后对 group 和 var 的每个组合运行 chi-sq 测试。

DF %>%
  pivot_longer(starts_with("group"), names_to = "group", values_to = "group_val") %>%
  pivot_longer(starts_with("var"), names_to = "var", values_to = "var_val") %>%
  group_by(group, var) %>%
  summarise(chisq_test(group_val, var_val)) %>%
  ungroup()
于 2020-12-16T21:10:09.747 回答
0

这是使用apply. 我相信有一种更优雅的方式来做到这一点dplyr。(请注意,这里我提取了测试的 p.value,但如果您愿意,您可以提取其他内容或整个测试结果)。

res <- apply(DF[,1:3], 2, function(x) { 
                            apply(DF[,4:7], 2, 
                              function(y) {chisq.test(x,y)$p.value})
                            })

于 2020-12-16T18:12:33.960 回答