1

我正在尝试对 R 中的示例数据框(df)中的多列运行 Kruskal wallis 测试,但我遇到了以下错误:

 Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups),  : 
  variable lengths differ (found for 'as.factor(Groups)') 

这是我的示例数据框(df):

Groups  Gene1   Gene2   Gene3   Gene4   Gene5   Gene6   Gene7   Gene8   Gene9   Gene10
Group1  120.67  69.33   1.24    2.31    0.39    6.57    2.49    383.84  415.23  NA
Group1  157 110.67  0.4 0.84    0.28    2.62    2.11    245.42  325.23  NA
Group1  113.5   66.75   1.07    4.53    0.33    2.37    2.35    421.25  352.03  73.51
Group1  131 79.67   1.13    5.03    0.72    3.36    2.24    305.32  432.81  71.11
Group1  120 79.67   0.91    3.84    0.74    3.77    1.92    298.91  382.43  66.49
Group2  125.67  83.67   2.07    1.73    0.38    3.89    2.09    233.81  377.21  72.1
Group2  103.33  68.67   1.01    4.89    0.3 4.5 1.75    231.5   381.73  53
Group2  121.33  74.67   0.54    2.39    3.95    3.7 2.46    310.66  355.97  143.61
Group2  136 83.67   1.6 1.75    0.32    5.17    2.36    410.21  389.62  170.34
Group2  143.67  71.33   0.56    1.22    0.26    4.48    2.62    294.01  491.57  96.72
Group2  134.67  69.67   0.85    1.77    0.45    3.58    2.44    236.61  441.32  69.06
Group2  158.33  98.33   0.87    3.69    0.51    2.53    2.6 257.66  396.96  41.94
Group2  147.33  88.33   NA  NA  NA  NA  NA  NA  NA  NA
Group2  95.67   59  1.39    0.56    0.31    2.49    2.09    395.38  420.28  64.83
Group3  135 82  13.31   24.05   1.21    3.83    2.83    313.71  327.84  66.8
Group3  124.67  78  1.12    2   0.71    3.77    2.42    334.36  358.9   131.35
Group3  152 98.33   1.11    1.54    0.35    2.11    2.21    297.68  433.48  117.18
Group3  135.33  73.67   0.13    2.99    0.3 2.4 1.86    296.82  415.13  112.97
Group3  135.33  87  0.91    3.73    0.65    2.92    1.85    335.31  412.16  103.18
Group4  124.67  77.67   0.28    0.81    0.49    2.62    1.96    251.49  468.19  80.27
Group4  125.67  72.33   1.01    1.82    0.35    3.65    1.62    335.18  264.74  145.15
Group4  169 105 0.6 3.12    0.29    3.9 2.22    311.01  459.85  82.89
Group4  123.67  76.33   0.65    1.78    0.47    2.77    1.57    253.56  283.38  59.07
Group5  132.67  76.33   2.94    17.01   0.27    3.99    2.55    354.78  493.02  145.36
Group5  NA  NA  1.34    1.42    0.4 4.21    2.02    243.26  345.2   43.91
Group5  144.33  75  NA  NA  0.55    3.26    2.85    312.16  419.86  55.71
Group5  136.25  78.25   NA  1.32    0.65    3.63    1.52    267.13  256.18  53.49
Group5  123.67  69.33   1.81    1.52    0.67    3.89    2   303.89  346.57  112.16
Group5  116.67  66.33   0.7 1.68    0.27    3.55    2.16    284.96  407.04  102.97
Group5  136.67  76  2.68    4.3 0.33    7.36    2.26    237.28  423.29  88.65
Group6  122 63.33   0.87    4.2 0.17    3.92    2.11    159.04  300.24  60.13
Group6  130.67  82.67   0.8 1.85    1   5.26    2.46    388.61  558.51  66.76
Group6  136.33  70.33   0.54    2.26    0.35    NA  NA  388.81  551.69  113.39
Group6  127.33  73  1.32    2.19    0.99    4.42    2.59    378.57  501.12  85.56
Group7  186.67  89.67   0.79    1.77    0.53    5.22    2.73    269.87  490.25  77.74
Group7  203 93  5.63    22.08   0.82    6.97    2.92    341.87  611.33  92.7
Group7  127 72.67   0.55    1.07    0.38    3.2 1.69    310.9   410.19  65.62
Group7  142 79.67   1.61    1.35    3.24    3.73    2.08    304.52  495.79  60.15

这是我的代码:

   kw.tests <- lapply(
         data[, -1],
         function(x) { kruskal.test(as.numeric(x) ~ as.factor(Groups), data = data_test, na.action=na.omit) }
   )

     Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups),  : 
      variable lengths differ (found for 'as.factor(Groups)') 

当我单独运行每个基因时,此代码运行完美,例如,对于 Gene1:

kruskal.test(Gene1 ~ as.factor(Groups), data = data_test, na.action=na.omit)

    Kruskal-Wallis rank sum test

data:  Gene1 by as.factor(Groups)
Kruskal-Wallis chi-squared = 5.6607, df = 6, p-value = 0.4622

但是,当我使用 lapply 甚至 for 循环时,它会给我这个错误。我已经多次搜索过这个错误,但以下答案都没有帮助我。

  1. 我知道这可能是由于文件中的 NA 造成的。但是,我无法避免 NA,因为我的数据框比这大得多。此外,即使有 NA,该测试也可以完美地针对每个 Gene 单独运行,无需 lapply 或 loops。
  2. 'Groups' 变量的可变长度与所有其他变量的长度相同,因此这也不是问题。

我在这里发布我的数据片段:

> dput(data_test)
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L), .Label = c("Group1", 
"Group2", "Group3", "Group4", "Group5", "Group6", "Group7"), class = "factor"), 
    Gene1 = c(120.67, 157, 113.5, 131, 120, 125.67, 103.33, 121.33, 
    136, 143.67, 134.67, 158.33, 147.33, 95.67, 135, 124.67, 
    152, 135.33, 135.33, 124.67, 125.67, 169, 123.67, 132.67, 
    NA, 144.33, 136.25, 123.67, 116.67, 136.67, 122, 130.67, 
    136.33, 127.33, 186.67, 203, 127, 142), Gene2 = c(69.33, 
    110.67, 66.75, 79.67, 79.67, 83.67, 68.67, 74.67, 83.67, 
    71.33, 69.67, 98.33, 88.33, 59, 82, 78, 98.33, 73.67, 87, 
    77.67, 72.33, 105, 76.33, 76.33, NA, 75, 78.25, 69.33, 66.33, 
    76, 63.33, 82.67, 70.33, 73, 89.67, 93, 72.67, 79.67), Gene3 = c(1.24, 
    0.4, 1.07, 1.13, 0.91, 2.07, 1.01, 0.54, 1.6, 0.56, 0.85, 
    0.87, NA, 1.39, 13.31, 1.12, 1.11, 0.13, 0.91, 0.28, 1.01, 
    0.6, 0.65, 2.94, 1.34, NA, NA, 1.81, 0.7, 2.68, 0.87, 0.8, 
    0.54, 1.32, 0.79, 5.63, 0.55, 1.61), Gene4 = c(2.31, 0.84, 
    4.53, 5.03, 3.84, 1.73, 4.89, 2.39, 1.75, 1.22, 1.77, 3.69, 
    NA, 0.56, 24.05, 2, 1.54, 2.99, 3.73, 0.81, 1.82, 3.12, 1.78, 
    17.01, 1.42, NA, 1.32, 1.52, 1.68, 4.3, 4.2, 1.85, 2.26, 
    2.19, 1.77, 22.08, 1.07, 1.35), Gene5 = c(0.39, 0.28, 0.33, 
    0.72, 0.74, 0.38, 0.3, 3.95, 0.32, 0.26, 0.45, 0.51, NA, 
    0.31, 1.21, 0.71, 0.35, 0.3, 0.65, 0.49, 0.35, 0.29, 0.47, 
    0.27, 0.4, 0.55, 0.65, 0.67, 0.27, 0.33, 0.17, 1, 0.35, 0.99, 
    0.53, 0.82, 0.38, 3.24), Gene6 = c(6.57, 2.62, 2.37, 3.36, 
    3.77, 3.89, 4.5, 3.7, 5.17, 4.48, 3.58, 2.53, NA, 2.49, 3.83, 
    3.77, 2.11, 2.4, 2.92, 2.62, 3.65, 3.9, 2.77, 3.99, 4.21, 
    3.26, 3.63, 3.89, 3.55, 7.36, 3.92, 5.26, NA, 4.42, 5.22, 
    6.97, 3.2, 3.73), Gene7 = c(2.49, 2.11, 2.35, 2.24, 1.92, 
    2.09, 1.75, 2.46, 2.36, 2.62, 2.44, 2.6, NA, 2.09, 2.83, 
    2.42, 2.21, 1.86, 1.85, 1.96, 1.62, 2.22, 1.57, 2.55, 2.02, 
    2.85, 1.52, 2, 2.16, 2.26, 2.11, 2.46, NA, 2.59, 2.73, 2.92, 
    1.69, 2.08), Gene8 = c(383.84, 245.42, 421.25, 305.32, 298.91, 
    233.81, 231.5, 310.66, 410.21, 294.01, 236.61, 257.66, NA, 
    395.38, 313.71, 334.36, 297.68, 296.82, 335.31, 251.49, 335.18, 
    311.01, 253.56, 354.78, 243.26, 312.16, 267.13, 303.89, 284.96, 
    237.28, 159.04, 388.61, 388.81, 378.57, 269.87, 341.87, 310.9, 
    304.52), Gene9 = c(415.23, 325.23, 352.03, 432.81, 382.43, 
    377.21, 381.73, 355.97, 389.62, 491.57, 441.32, 396.96, NA, 
    420.28, 327.84, 358.9, 433.48, 415.13, 412.16, 468.19, 264.74, 
    459.85, 283.38, 493.02, 345.2, 419.86, 256.18, 346.57, 407.04, 
    423.29, 300.24, 558.51, 551.69, 501.12, 490.25, 611.33, 410.19, 
    495.79), Gene10 = c(NA, NA, 73.51, 71.11, 66.49, 72.1, 53, 
    143.61, 170.34, 96.72, 69.06, 41.94, NA, 64.83, 66.8, 131.35, 
    117.18, 112.97, 103.18, 80.27, 145.15, 82.89, 59.07, 145.36, 
    43.91, 55.71, 53.49, 112.16, 102.97, 88.65, 60.13, 66.76, 
    113.39, 85.56, 77.74, 92.7, 65.62, 60.15)), class = "data.frame", row.names = c(NA, 
-38L))

任何进一步的帮助表示赞赏。感谢您。

4

1 回答 1

1

您在 lapply / apply 调用中使用了错误的数据集名称

apply(data_test[,-1],2,function(x){kruskal.test(as.numeric(x)~as.factor(data_test$Groups))})

为我工作。

于 2019-07-12T12:48:47.103 回答