r - 使用 purrr 从不同的数据框中选择并应用正确的模型

Question

在我的数据中，我有 50 多种不同化合物的相关数据（饮食和肝脏）（此处简化）。

library(tidyverse)
Sigma <- matrix(.7, nrow=6, ncol=6) + diag(6)*.3
vars_tr <- data.frame(MASS::mvrnorm(n=10, mu=c(2:7), Sigma=Sigma))

tr<-tibble(
  compound=c(rep("A", 10), rep("B", 10), rep("C",10)),
  diet=c(vars_tr$X1, vars_tr$X2, vars_tr$X3),
  liver=c(vars_tr$X4, vars_tr$X5, vars_tr$X6))

按照对多个模型进行回归的指导，我创建了一个嵌套数据框并存储了输出（本周学习这种方法是救命稻草！）。

model<-function(df){lm(data=df, liver~diet)}

mods<- tr %>%
  group_by(compound) %>%
  nest() %>%
  mutate(model=map(data, model))

现在我有了新的“饮食”数据，其中不存在“肝脏”数据。

new<-tibble(
  compound=c(rep("A", 10), rep("B", 10), rep("C",10)),
  diet=c(rnorm(10, 4), rnorm(10, 5), rnorm(10,6)))

我想做的是利用 purrr 使用正确的化合物模型为每种饮食浓度生成肝脏浓度。我最好的尝试如下：

preds<-function(c, x){    
  add_predictions(tibble(diet=x), filter(mods, compound==c)$model[[1]], 'liver')$liver
}

new%>%
  mutate(liver=map2(compound, diet, preds))

返回错误。

我将不胜感激任何帮助！

编辑 2020 年 6 月 4 日：

根据下面 Bruno 和 Ronak Shah 的有用评论，我取得了一些进展，但还没有找到解决方案。两者都建议将模型加入现有表，这比我所做的更有意义。

基于此，执行以下操作相对简单：

new_mods<-
  new%>%
  group_by(compound)%>%
  nest()%>%
  left_join(., select(mods_d, compound, model), , by='compound')%>%
  mutate(predicts = map2(data, model, add_predictions))%>%
  unnest(predicts)

score 1 · Accepted Answer

您可以使用连接操作并继续处理 tibbles

library(tidyverse)
library(MASS)

Sigma <- matrix(.7, nrow=6, ncol=6) + diag(6)*.3
vars_tr <- data.frame(mvrnorm(n=10, mu=c(2:7), Sigma=Sigma))

tr<-tibble(
  compound=c(rep("A", 10), rep("B", 10), rep("C",10)),
  diet=c(vars_tr$X1, vars_tr$X2, vars_tr$X3),
  liver=c(vars_tr$X4, vars_tr$X5, vars_tr$X6))

model<-function(df){lm(data=df, liver~diet)}

mods<- tr %>%
  nest_by(compound) %>% 
  mutate(model = list(model(data)))

new<-tibble(
  compound=c(rep("A", 10), rep("B", 10), rep("C",10)),
  diet=c(rnorm(10, 4), rnorm(10, 5), rnorm(10,6)))

new_nest <- new %>% 
  nest_by(compound)

results <- mods %>% 
  left_join(new_nest,by = "compound") %>% 
  mutate(predicts = list(predict(model,data.y)))

score 0 · Accepted Answer

您可以为预测创建一个函数：

preds<-function(data, mod){   
   modelr::add_predictions(data, mod)$liver
}

nest每个的数据框compound，加入mods并为每组数据应用相应的模型。

library(dplyr)
new %>%
   tidyr::nest(data = diet) %>%
   left_join(mods, by = 'compound') %>%
   mutate(liver = purrr::map2(data.y, model, preds))


# A tibble: 3 x 5
#  compound data.x            data.y            model  liver     
#  <chr>    <list>            <list>            <list> <list>    
#1 A        <tibble [10 × 1]> <tibble [10 × 2]> <lm>   <dbl [10]>
#2 B        <tibble [10 × 1]> <tibble [10 × 2]> <lm>   <dbl [10]>
#3 C        <tibble [10 × 1]> <tibble [10 × 2]> <lm>   <dbl [10]>

根据需要，您可以根据需要select关联列和unnest结果。

r - 使用 purrr 从不同的数据框中选择并应用正确的模型

2 回答 2

Related

Reference