我正在尝试对 R 中的示例数据框(df)中的多列运行 Kruskal wallis 测试,但我遇到了以下错误:
Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups), :
variable lengths differ (found for 'as.factor(Groups)')
这是我的示例数据框(df):
Groups Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 Gene7 Gene8 Gene9 Gene10
Group1 120.67 69.33 1.24 2.31 0.39 6.57 2.49 383.84 415.23 NA
Group1 157 110.67 0.4 0.84 0.28 2.62 2.11 245.42 325.23 NA
Group1 113.5 66.75 1.07 4.53 0.33 2.37 2.35 421.25 352.03 73.51
Group1 131 79.67 1.13 5.03 0.72 3.36 2.24 305.32 432.81 71.11
Group1 120 79.67 0.91 3.84 0.74 3.77 1.92 298.91 382.43 66.49
Group2 125.67 83.67 2.07 1.73 0.38 3.89 2.09 233.81 377.21 72.1
Group2 103.33 68.67 1.01 4.89 0.3 4.5 1.75 231.5 381.73 53
Group2 121.33 74.67 0.54 2.39 3.95 3.7 2.46 310.66 355.97 143.61
Group2 136 83.67 1.6 1.75 0.32 5.17 2.36 410.21 389.62 170.34
Group2 143.67 71.33 0.56 1.22 0.26 4.48 2.62 294.01 491.57 96.72
Group2 134.67 69.67 0.85 1.77 0.45 3.58 2.44 236.61 441.32 69.06
Group2 158.33 98.33 0.87 3.69 0.51 2.53 2.6 257.66 396.96 41.94
Group2 147.33 88.33 NA NA NA NA NA NA NA NA
Group2 95.67 59 1.39 0.56 0.31 2.49 2.09 395.38 420.28 64.83
Group3 135 82 13.31 24.05 1.21 3.83 2.83 313.71 327.84 66.8
Group3 124.67 78 1.12 2 0.71 3.77 2.42 334.36 358.9 131.35
Group3 152 98.33 1.11 1.54 0.35 2.11 2.21 297.68 433.48 117.18
Group3 135.33 73.67 0.13 2.99 0.3 2.4 1.86 296.82 415.13 112.97
Group3 135.33 87 0.91 3.73 0.65 2.92 1.85 335.31 412.16 103.18
Group4 124.67 77.67 0.28 0.81 0.49 2.62 1.96 251.49 468.19 80.27
Group4 125.67 72.33 1.01 1.82 0.35 3.65 1.62 335.18 264.74 145.15
Group4 169 105 0.6 3.12 0.29 3.9 2.22 311.01 459.85 82.89
Group4 123.67 76.33 0.65 1.78 0.47 2.77 1.57 253.56 283.38 59.07
Group5 132.67 76.33 2.94 17.01 0.27 3.99 2.55 354.78 493.02 145.36
Group5 NA NA 1.34 1.42 0.4 4.21 2.02 243.26 345.2 43.91
Group5 144.33 75 NA NA 0.55 3.26 2.85 312.16 419.86 55.71
Group5 136.25 78.25 NA 1.32 0.65 3.63 1.52 267.13 256.18 53.49
Group5 123.67 69.33 1.81 1.52 0.67 3.89 2 303.89 346.57 112.16
Group5 116.67 66.33 0.7 1.68 0.27 3.55 2.16 284.96 407.04 102.97
Group5 136.67 76 2.68 4.3 0.33 7.36 2.26 237.28 423.29 88.65
Group6 122 63.33 0.87 4.2 0.17 3.92 2.11 159.04 300.24 60.13
Group6 130.67 82.67 0.8 1.85 1 5.26 2.46 388.61 558.51 66.76
Group6 136.33 70.33 0.54 2.26 0.35 NA NA 388.81 551.69 113.39
Group6 127.33 73 1.32 2.19 0.99 4.42 2.59 378.57 501.12 85.56
Group7 186.67 89.67 0.79 1.77 0.53 5.22 2.73 269.87 490.25 77.74
Group7 203 93 5.63 22.08 0.82 6.97 2.92 341.87 611.33 92.7
Group7 127 72.67 0.55 1.07 0.38 3.2 1.69 310.9 410.19 65.62
Group7 142 79.67 1.61 1.35 3.24 3.73 2.08 304.52 495.79 60.15
这是我的代码:
kw.tests <- lapply(
data[, -1],
function(x) { kruskal.test(as.numeric(x) ~ as.factor(Groups), data = data_test, na.action=na.omit) }
)
Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups), :
variable lengths differ (found for 'as.factor(Groups)')
当我单独运行每个基因时,此代码运行完美,例如,对于 Gene1:
kruskal.test(Gene1 ~ as.factor(Groups), data = data_test, na.action=na.omit)
Kruskal-Wallis rank sum test
data: Gene1 by as.factor(Groups)
Kruskal-Wallis chi-squared = 5.6607, df = 6, p-value = 0.4622
但是,当我使用 lapply 甚至 for 循环时,它会给我这个错误。我已经多次搜索过这个错误,但以下答案都没有帮助我。
- 我知道这可能是由于文件中的 NA 造成的。但是,我无法避免 NA,因为我的数据框比这大得多。此外,即使有 NA,该测试也可以完美地针对每个 Gene 单独运行,无需 lapply 或 loops。
- 'Groups' 变量的可变长度与所有其他变量的长度相同,因此这也不是问题。
我在这里发布我的数据片段:
> dput(data_test)
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L), .Label = c("Group1",
"Group2", "Group3", "Group4", "Group5", "Group6", "Group7"), class = "factor"),
Gene1 = c(120.67, 157, 113.5, 131, 120, 125.67, 103.33, 121.33,
136, 143.67, 134.67, 158.33, 147.33, 95.67, 135, 124.67,
152, 135.33, 135.33, 124.67, 125.67, 169, 123.67, 132.67,
NA, 144.33, 136.25, 123.67, 116.67, 136.67, 122, 130.67,
136.33, 127.33, 186.67, 203, 127, 142), Gene2 = c(69.33,
110.67, 66.75, 79.67, 79.67, 83.67, 68.67, 74.67, 83.67,
71.33, 69.67, 98.33, 88.33, 59, 82, 78, 98.33, 73.67, 87,
77.67, 72.33, 105, 76.33, 76.33, NA, 75, 78.25, 69.33, 66.33,
76, 63.33, 82.67, 70.33, 73, 89.67, 93, 72.67, 79.67), Gene3 = c(1.24,
0.4, 1.07, 1.13, 0.91, 2.07, 1.01, 0.54, 1.6, 0.56, 0.85,
0.87, NA, 1.39, 13.31, 1.12, 1.11, 0.13, 0.91, 0.28, 1.01,
0.6, 0.65, 2.94, 1.34, NA, NA, 1.81, 0.7, 2.68, 0.87, 0.8,
0.54, 1.32, 0.79, 5.63, 0.55, 1.61), Gene4 = c(2.31, 0.84,
4.53, 5.03, 3.84, 1.73, 4.89, 2.39, 1.75, 1.22, 1.77, 3.69,
NA, 0.56, 24.05, 2, 1.54, 2.99, 3.73, 0.81, 1.82, 3.12, 1.78,
17.01, 1.42, NA, 1.32, 1.52, 1.68, 4.3, 4.2, 1.85, 2.26,
2.19, 1.77, 22.08, 1.07, 1.35), Gene5 = c(0.39, 0.28, 0.33,
0.72, 0.74, 0.38, 0.3, 3.95, 0.32, 0.26, 0.45, 0.51, NA,
0.31, 1.21, 0.71, 0.35, 0.3, 0.65, 0.49, 0.35, 0.29, 0.47,
0.27, 0.4, 0.55, 0.65, 0.67, 0.27, 0.33, 0.17, 1, 0.35, 0.99,
0.53, 0.82, 0.38, 3.24), Gene6 = c(6.57, 2.62, 2.37, 3.36,
3.77, 3.89, 4.5, 3.7, 5.17, 4.48, 3.58, 2.53, NA, 2.49, 3.83,
3.77, 2.11, 2.4, 2.92, 2.62, 3.65, 3.9, 2.77, 3.99, 4.21,
3.26, 3.63, 3.89, 3.55, 7.36, 3.92, 5.26, NA, 4.42, 5.22,
6.97, 3.2, 3.73), Gene7 = c(2.49, 2.11, 2.35, 2.24, 1.92,
2.09, 1.75, 2.46, 2.36, 2.62, 2.44, 2.6, NA, 2.09, 2.83,
2.42, 2.21, 1.86, 1.85, 1.96, 1.62, 2.22, 1.57, 2.55, 2.02,
2.85, 1.52, 2, 2.16, 2.26, 2.11, 2.46, NA, 2.59, 2.73, 2.92,
1.69, 2.08), Gene8 = c(383.84, 245.42, 421.25, 305.32, 298.91,
233.81, 231.5, 310.66, 410.21, 294.01, 236.61, 257.66, NA,
395.38, 313.71, 334.36, 297.68, 296.82, 335.31, 251.49, 335.18,
311.01, 253.56, 354.78, 243.26, 312.16, 267.13, 303.89, 284.96,
237.28, 159.04, 388.61, 388.81, 378.57, 269.87, 341.87, 310.9,
304.52), Gene9 = c(415.23, 325.23, 352.03, 432.81, 382.43,
377.21, 381.73, 355.97, 389.62, 491.57, 441.32, 396.96, NA,
420.28, 327.84, 358.9, 433.48, 415.13, 412.16, 468.19, 264.74,
459.85, 283.38, 493.02, 345.2, 419.86, 256.18, 346.57, 407.04,
423.29, 300.24, 558.51, 551.69, 501.12, 490.25, 611.33, 410.19,
495.79), Gene10 = c(NA, NA, 73.51, 71.11, 66.49, 72.1, 53,
143.61, 170.34, 96.72, 69.06, 41.94, NA, 64.83, 66.8, 131.35,
117.18, 112.97, 103.18, 80.27, 145.15, 82.89, 59.07, 145.36,
43.91, 55.71, 53.49, 112.16, 102.97, 88.65, 60.13, 66.76,
113.39, 85.56, 77.74, 92.7, 65.62, 60.15)), class = "data.frame", row.names = c(NA,
-38L))
任何进一步的帮助表示赞赏。感谢您。