r - 从扰乱对象中提取交集列表

Question

我正在与UpSetR进行一些比较，并且我想保存属于每个交集的元素列表。这可能吗？我到处都找不到...

手动完成会很乏味（很多列表），而且因为它们是经过计算的，所以无法保存它们令人沮丧

score 2 · Accepted Answer

这个（还）没有现成的upSetR函数。但是，可以提取它：

library(UpSetR)

# Example input as list, expected output is 1 and 5:
listInput <- list(one = c(1, 2, 3, 5, 7, 8, 11, 12, 13), 
                  two = c(1, 2, 4, 5, 10),
                  three = c(1, 5, 6, 7, 8, 9, 10, 12, 13))

当分配不高兴时，返回一个值，其中还包括数据：

x <- upset(fromList(listInput))
x$New_data
#    one two three
# 1    1   1     1
# 2    1   1     0
# 3    1   0     0
# 4    1   1     1
# 5    1   0     1
# 6    1   0     1
# 7    1   0     0
# 8    1   0     1
# 9    1   0     1
# 10   0   1     0
# 11   0   1     1
# 12   0   0     1
# 13   0   0     1

从这里我们可以看到它是第 1 行，第 4 行在所有三组中都可以找到。项目的顺序是根据它们在列表中出现的顺序定义的，请参阅：

x1 <- unlist(listInput, use.names = FALSE)
x1 <- x1[ !duplicated(x1) ]
x1
# [1]  1  2  3  5  7  8 11 12 13  4 10  6  9

现在我们知道“New_data”中的行号在我们的列表中引用。所以，我们有 3 列，过滤总和为 3 的行：

x1[ rowSums(x$New_data) == 3 ]
# [1] 1 5

或者我们可以只使用Reduce：

Reduce(intersect, listInput)
# [1] 1 5

score 1 · Accepted Answer

这是我提取不同交叉点以及其中的元素列表的方法。

主要思想是粘贴二进制表中的所有 0 和 1，为每个交集创建唯一标识符，然后使用 dplyr::group_by 函数提取信息

data <- data.frame(
  entry = paste0("Entry.", 1:10),
  "A" = c(0,0,0,0,1,0,1,1,0,0),
  "B" = c(1,0,0,0,1,1,1,1,1,0),
  "C" = c(1,1,1,1,0,0,1,0,1,1)
)

# NOT REQUIRED. Only to confirm that upset works with these data
upset(data)

然后，您可以通过粘贴所有二进制列来识别交叉点。为此，我使用了统一便利功能：

注意：您可能必须根据您的数据是否具有行名或具有名称的列来更改此设置

data_with_intersection <- data %>%
  unite(col = "intersection", -c("entry"), sep = "")

从这里，您可以轻松计算每个交叉点的大小：

# Table of intersections and the number of entries
data_with_intersection %>%
  group_by(intersection) %>%
  summarise(n = n()) %>%
  arrange(desc(n))

甚至提取每个交集中的条目/元素列表：

# List of intersections and their entries
data_with_intersection %>%
  group_by(intersection) %>%
  summarise(list = list(entry)) %>%
  mutate(list = setNames(list, intersection)) %>%
  pull(list)

r - 从扰乱对象中提取交集列表

2 回答 2

Related

Reference