r - R - 在按变量 2 分组的变量 1 的水平之间进行比较时变量的最大值

Question

考虑以下数据

set.seed(123)

example.df <- data.frame( 
gene = sample(c("A", "B", "C", "D"), 100, replace = TRUE),
treated = sample(c("Yes", "No"), 100, replace = TRUE), 
resp=rnorm(100, 10,5), effect = rnorm (100, 25, 5))

当它们按基因水平进行比较并按处理分组时，我试图获得所有变量的最大值。我可以像这样创建基因组合，

combn(sort(unique(example.df$gene)), 2, simplify = T)

#     [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] A    A    A    B    B    c   
#[2,] B    c    D    c    D    D   
#Levels: A B c D

编辑：我正在寻找的输出是这样的数据框

comparison   group    max.resp    max.effect
A-B          no       value1      value2
....
C-D          no       valueX      valueY
A-B          yes      value3      value4 
.... 
C-D          yes      valueXX     valueYY

虽然我能够获得按治疗分组的每个单独基因水平的最大值......

max.df <- example.df %>% 
           group_by(treated, gene) %>% 
           nest() %>% 
           mutate(mod = map(data, ~summarise_if(.x, is.numeric, max, na.rm = TRUE))) %>% 
           select(treated, gene, mod) %>% 
           unnest(mod) %>% 
           arrange(treated, gene)

尽管试图解决这个问题超过一天，但我无法弄清楚如何为每个 2 级基因比较（A 与 B、A 与 C、A 与 D、B 与 C、B 与D 和 C 与 D）按处理分组。

任何帮助表示赞赏。谢谢。

score 1 · Accepted Answer

我找到了一个解决方案，它可能有点乱，但我会以更好的方式更新它，它不需要任何时间

library(tidyverse)

首先，我生成一个包含两列 Gen1 和 Gen2 的数据框，用于可能的比较，与您使用的非常相似，combn但创建了一个 data.frame

GeneComp <- expand.grid(Gen1 = unique(example.df$gene), Gen2 = unique(example.df$gene)) %>% filter(Gen1 != Gen2) %>% arrange(Gen1)

然后我循环遍历它分组

Comps <- list()
for(i in 1:nrow(GeneComp)){
  Comps[[i]] <- example.df %>% filter(gene == GeneComp[i,]$Gen1 | gene == GeneComp[i,]$Gen2) %>% # This line filters only the data with genes in the ith row
  group_by(treated) %>% # Then gorup by treated
  summarise_if(is.numeric, max) %>% # then summarise max if numeric
  mutate(Comparison = paste(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2, sep = "-")) # and generate the comparisson variable
}

Comps <- bind_rows(Comps) # and finally join in a data frame

让我知道它是否可以满足您的所有需求

添加以仅获取一次数据

重要的是你的基因是字符串而不是因素，所以你可能必须这样做

options(stringsAsFactors = FALSE)

example.df <- data.frame( 
  gene = c(sample(c("A", "B", "C", "D"), 100, replace = TRUE)),
  treated = sample(c("Yes", "No"), 100, replace = TRUE), 
  resp=rnorm(100, 10,5), effect = rnorm (100, 25, 5))

然后再次expand.grid添加stringsAsFactors = F参数

GeneComp <- expand.grid(Gen1 = unique(example.df$gene), Gen2 = unique(example.df$gene), stringsAsFactors = F) %>% filter(Gen1 != Gen2) %>% arrange(Gen1)

现在，它允许您在粘贴Comparisson变量以对两个输入进行排序时进入循环，这样，行将被复制，但是当您最后使用该distinct函数时，它会使您的数据按照您想要的方式

Comps <- list()
for(i in 1:nrow(GeneComp)){
    Comps[[i]] <- example.df %>% filter(gene == GeneComp[i,]$Gen1 | gene == GeneComp[i,]$Gen2) %>% # This line filters only the data with genes in the ith row
    group_by(treated) %>% # Then gorup by treated
    summarise_if(is.numeric, max) %>% # then summarise max if numeric
    mutate(Comparison = paste(sort(c(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2))[1], sort(c(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2))[2], sep = "-")) # and generate the comparisson variable
}

Comps <- bind_rows(Comps) %>% distinct() # and finally join in a data frame

r - R - 在按变量 2 分组的变量 1 的水平之间进行比较时变量的最大值

1 回答 1

添加以仅获取一次数据

Related

Reference