我想测试一组数据的独立性,可重复的示例如下:
income <- c("q1","q2","q3","q4","q5","q1","q2","q3","q4","q5","q1","q2","q3","q4","q5","q1","q2","q3","q4","q5")
v1 <- as.numeric(round(runif(20,40,60),2))
v2 <- as.numeric(round(runif(20,10,20),2))
v3 <- as.numeric(round(runif(20,100,200),2))
v4 <- as.numeric(round(runif(20,0,20),2))
df <- as.data.frame(cbind(income,v1,v2,v3,v4))
income v1 v2 v3 v4
1 q1 47.78 18.7 148.75 14.15
2 q2 59.22 19.95 141.65 2.63
3 q3 58.34 14.96 169.94 20
4 q4 40.35 12.28 143.82 12.14
5 q5 59.72 17.14 191.72 10.66
6 q1 59.44 10.32 128.23 1
7 q2 47.65 13.87 187.51 5.74
...
我想测试不同收入组(q1-q5)之间v1、v2、v3和v4的独立性
它应该像
income v1 v2 v3 v4 p-value
q1 mean.v1.q1 mean.v2.q1 mean.v3.q1 mean.v4.q1
q2 mean.v1.q2 mean.v2.q2 mean.v3.q2 mean.v4.q2
q3 mean.v1.q3 mean.v2.q3 mean.v3.q3 mean.v4.q3
q4 mean.v1.q4 mean.v2.q4 mean.v3.q4 mean.v4.q4
q5 mean.v1.q5 mean.v2.q5 mean.v3.q5 mean.v4.q5
我想我应该应用 ANOVA 来获得测试结果,但我不确定如何。任何人都可以帮忙吗?
我想出了下面的脚本,这是正确的方法吗?有什么需要改进的吗?谢谢!
v1mean <- as.data.frame(tapply(v1,income,mean))
colnames(v1mean) <- "v1"
v2mean <- as.data.frame(tapply(v2,income,mean))
colnames(v2mean) <- "v2"
v3mean <- as.data.frame(tapply(v3,income,mean))
colnames(v3mean) <- "v3"
v4mean <- as.data.frame(tapply(v4,income,mean))
colnames(v4mean) <- "v4"
mean <- cbind(income=rownames(v1mean),v1mean,v2mean,v3mean,v4mean)
library(reshape)
mean <- melt(mean,id="income")
aov <- aov(value~variable + income,data=mean)
summary(aov)