1

我正在尝试编写一个自定义函数以传递给 dplyr 中的 do()。最终目标是在 group_by() 之外使用它来让我的自定义函数在单独的数据块上运行。


这是我的数据集的样子

    > head(data,4)
      subject  ps polarity       rs   log_rs
    1  Danesh 1.0  regular 216.0000 5.375278
    2  Danesh 0.9  regular 285.7143 5.654992
    3  Danesh 0.8  regular 186.3354 5.227548
    4  Danesh 0.7  regular 218.1818 5.385329

以及生成此数据集的代码:

    data <- structure(list(subject = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ChristinaP", 
    "Danesh", "Elizabeth", "Ina", "JaclynT", "JessicaS", "Rhea", 
    "Samuel", "Tyler", "Vinodh"), class = "factor"), ps = c(1, 0.9, 
    0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 1, 0.9, 0.8, 0.7, 0.6, 
    0.5, 0.4, 0.3, 0.2, 0.1), polarity = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L), .Label = c("regular", "reverse"), class = "factor"), rs = c(216, 
    285.714285714286, 186.335403726708, 218.181818181818, 183.673469387755, 
    194.174757281553, 202.020202020202, 184.615384615385, 153.452685421995, 
    191.693290734824, 216, 285.714285714286, 186.335403726708, 218.181818181818, 
    183.673469387755, 194.174757281553, 202.020202020202, 184.615384615385, 
    153.452685421995, 191.693290734824), log_rs = c(5.37527840768417, 
    5.65499231048677, 5.22754829565983, 5.38532874353767, 5.21315955820773, 
    5.26875856430649, 5.30836770240154, 5.2182746588745, 5.03339228121887, 
    5.25589665066408, 5.37527840768417, 5.65499231048677, 5.22754829565983, 
    5.38532874353767, 5.21315955820773, 5.26875856430649, 5.30836770240154, 
    5.2182746588745, 5.03339228121887, 5.25589665066408)), class = "data.frame", 
    row.names = c(NA, -20L), .Names = c("subject", "ps", "polarity", "rs", "log_rs"))

最后的调用看起来像:

  temp_df <- data %>%
    group_by (subject, polarity) %>%
    do (customFun(.$ps, .$rs))

我的自定义函数做了很多事情(为简单起见,我在这里跳过),其中计算基于变量 ps 的值选择的行子集上的 max(rs)。换句话说,我只保留 ps 低于第 2 行的 ps 或大于第 5 行的 ps 的行,并计算这些选定行的最大 rs ,如下面的虚拟示例所示:

customFun <- function(df, ps, rs) {

   omax = df %>%
       filter (ps < ps[2] | ps > ps[5]) %>%
       summarise (max(rs)) 

  }

问题是我想在 group_by() 子数据帧中传递这个函数,所以我不能给函数中调用的数据帧一个特定的名称。相反,我希望函数知道它应该在当前数据块上自动工作。我试过这样的事情:

   omax = . %>%
       filter (ps < ps[2] | ps > ps[5]) %>%
       summarise (max(rs)) 

还有许多其他变体,但似乎没有任何效果......我在网上找到了一些类似的问题,比如这里,但仍然无法弄清楚。有关如何解决此问题的任何帮助/提示?谢谢!

4

1 回答 1

0

我在这里找到了我的问题的答案

自定义功能:

  customFun <- function(df, ps, rs) {
     omax = df %>%
         filter (ps < ps[2] | ps > ps[5]) %>%
         summarise (max(rs)) 
    }

最后呼叫:

  temp_df <- data %>%
    group_by (subject, polarity) %>%
    do (customFun(., .$ps, .$rs))
于 2017-07-22T04:49:18.447 回答