r - purrr 将 t.test 映射到拆分 df

Question

我是 purrr 的新手，Hadley很有前途的函数式编程R 库。我正在尝试采用分组和拆分数据框并对变量运行 t 检验。使用示例数据集的示例可能如下所示。

mtcars %>% 
  dplyr::select(cyl, mpg) %>% 
  group_by(as.character(cyl)) %>% 
  split(.$cyl) %>% 
  map(~ t.test(.$`4`$mpg, .$`6`$mpg))

这会导致以下错误：

Error in var(x) : 'x' is NULL
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In mean.default(x) : argument is not numeric or logical: returning NA

我只是误解了它的map工作原理吗？或者有没有更好的方法来思考这个问题？

score 10 · Accepted Answer

我不完全理解预期的结果，但这可能是答案的起点。 map()from在公式参数中purrr使用。.x

这是完成我认为您正在尝试使用 just 的一种方法purrr。

mtcars %>%
  split(as.character(.$cyl)) %>%
  map(~t.test(.x$mpg))

但是，purrr::by_slice()与dplyr::group_by().

library(purrr)
library(dplyr)

mtcars %>% 
  dplyr::select(cyl, mpg) %>% 
  group_by(as.character(cyl)) %>%
  by_slice(~ t.test(.x$mpg))

或者，您可以完全跳过purrr使用dplyr:::summarise().

library(purrr)
library(dplyr)

mtcars %>% 
  dplyr::select(cyl, mpg) %>% 
  group_by(as.character(cyl)) %>%
  summarise(t_test = data_frame(t.test(.$mpg)))

如果嵌套data.frame令人困惑，broom可以帮助我们得到一个简单data.frame的结果摘要。

purrr+ broom+tidyr

library(broom)
library(tidyr)
mtcars %>%
  group_by(as.character(cyl)) %>%
  by_slice(~tidy(t.test(.x$mpg))) %>%
  unnest()

dplyr+broom

library(broom)

mtcars %>% 
  dplyr::select(cyl, mpg) %>% 
  group_by(as.character(cyl)) %>%
  do(tidy(t.test(.$mpg)))

编辑以包括对评论的回应

有了管道，我们可以很快地忘乎所以。我认为沃尔特的回答做得很好，但我想确保我提供了一个purrr-ty 答案。我希望使用pipeR不会过于混乱。

library(purrr)
library(dplyr)
library(broom)
library(tidyr)
library(pipeR)

mtcars %>>%
  (split(.,.$cyl)) %>>%
  (split_cyl~
    names(split_cyl) %>>%
     (
       cross_d(
         list(against=.,tested=.),
         .filter = `==`
       )
     ) %>>%
     by_row(
       ~tidy(t.test(split_cyl[[.x$tested]]$mpg,split_cyl[[.x$against]]$mpg))
     )
  ) %>>%
  unnest()

score 6 · Accepted Answer

尤其是在处理需要多个输入的管道时（我们这里没有 Haskell 的箭头），我发现先通过类型/签名进行推理更容易，然后将逻辑封装在函数中（您可以对其进行单元测试），然后编写一个简洁的链.

在这种情况下，您想比较所有可能的向量对，因此我将设定一个目标，即编写一个函数，该函数接受一对（即 2 个列表）向量并返回它们的 2-way t.test。

完成此操作后，您只需要一些胶水。所以计划是：

编写一个接受向量列表并执行 2-way t 检验的函数。
编写一个从 mtcars 获取向量的函数/管道（简单）。
将以上内容映射到对列表上。

在编写任何代码之前制定此计划很重要。由于 R 不是强类型化的事实，事情在某种程度上被混淆了，但是这样你首先推理“类型”，然后是实现。

步骤1

t.test 带点，所以我们习惯purrr:lift让它带一个列表。由于我们不想匹配列表元素的名称，我们使用.unnamed = TRUE. 此外，我们更清楚地说明我们正在使用t.testarity 为 2 的函数（尽管代码工作不需要这个额外的步骤）。

t.test2 <- function(x, y) t.test(x, y)
liftedTT <- lift(t.test2, .unnamed = TRUE)

第2步

将我们在步骤 1 中得到的函数包装成一个函数链，它需要一个简单的对（这里我使用索引，使用 cyl 因子级别应该很容易，但我没有时间弄清楚）。

doTT <- function(pair) {
  mtcars %>%
    split(as.character(.$cyl)) %>%
    map(~ select(., mpg)) %>% 
    extract(pair) %>% 
    liftedTT %>% 
    broom::tidy
}

第 3 步

既然我们已经准备好了所有的乐高积木，那么构图就变得微不足道了。

1:length(unique(mtcars$cyl)) %>% 
  combn(2) %>% 
  as.data.frame %>% 
  as.list %>% 
  map(~ doTT(.))

$V1
  estimate estimate1 estimate2 statistic      p.value parameter conf.low conf.high
1 6.920779  26.66364  19.74286  4.719059 0.0004048495  12.95598 3.751376  10.09018

$V2
  estimate estimate1 estimate2 statistic      p.value parameter conf.low conf.high
1 11.56364  26.66364      15.1  7.596664 1.641348e-06  14.96675 8.318518  14.80876

$V3
  estimate estimate1 estimate2 statistic      p.value parameter conf.low conf.high
1 4.642857  19.74286      15.1  5.291135 4.540355e-05  18.50248 2.802925  6.482789

这里有很多需要清理的地方，主要是使用因子级别并将它们保留在输出中（而不是在第二个函数中使用全局变量），但我认为你想要的核心在这里。根据我的经验，不迷路的诀窍是由内而外地工作。

score 2 · Accepted Answer

要执行两个样本 t 检验，您必须创建气缸数的组合。我看不到您可以使用purrr函数创建组合。但是，仅使用purrr基本 R 函数的方法是

library(purrr)
t_test2 <- mtcars %>% split(.$cyl) %>%
          transpose() %>%
          .[["mpg"]] %>%
          (function(x) combn(names(x), m=2, function(y) t.test(flatten_dbl(x[y[1]]), flatten_dbl(x[y[2]])) , simplify=FALSE))

虽然这看起来有点做作。

仅使用带有链接的基本 R 函数的类似方法是

t_test <- mtcars %>% split(.$cyl) %>%
                          (function(x) combn(names(x), m=2, function(y) x[y], simplify=FALSE)) %>%
                           lapply( function(x) t.test(x[[1]]$mpg, x[[2]]$mpg))

r - purrr 将 t.test 映射到拆分 df

3 回答 3

步骤1

第2步

第 3 步

Related

Reference