我有一个非常大(数百万点)的连接图和许多潜在的分割算法来确定组成员身份。是否有集合中的现有实现或类似的 R 包来计算可能的集合之间的共识集。
一个例子:
假设我有 10 个总分和三个用于选择组和成员的算法。
> algorithm1<-list(c(1,2,3),c(4,5,6),c(7,8,9,10))
> algorithm2<-list(c(1,2,3),c(4,6),c(5,7,8,9,10))
> algorithm3<-list(c(1,2,3),c(4,6),c(5,7,8),c(9,10))
> algorithm1
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5 6
[[3]]
[1] 7 8 9 10
> algorithm2
[[1]]
[1] 1 2 3
[[2]]
[1] 4 6
[[3]]
[1] 5 7 8 9 10
> algorithm3
[[1]]
[1] 1 2 3
[[2]]
[1] 4 6
[[3]]
[1] 5 7 8
[[4]]
[1] 9 10
所有三种算法都同意在 1、2、3 之间存在成员资格,但其余组需要一个多数规则算法来确定与输入组相比最小化损失的组的最小数量。这感觉像是一个可能已经解决的排列/组合领域。这不是我的领域,我需要朝着正确的方向前进。
一个,不完整的,我考虑过的事情是在成员之间生成成对链接,链接强度等于一对点包含在集合中的次数。
> library(reshape2)
>
> pairwise_count<-function(x){
+
+ #For each group, get all pairwise combination of members
+ m<-lapply(x,function(y){
+ as.data.frame(t(combn(y,2)))
+ })
+
+ #Bind groups into a dataframe and give it a count column
+ df<-bind_rows(m)
+ colnames(df)<-c("Point1","Point2")
+ return(df)
+ }
>
> #Example
> pairwise_count(algorithm1)
Point1 Point2
1 1 2
2 1 3
3 2 3
4 4 5
5 4 6
6 5 6
7 7 8
8 7 9
9 7 10
10 8 9
11 8 10
12 9 10
> #Compute for all algorithms
> alldf<-list(algorithm1=pairwise_count(algorithm1),algorithm2=pairwise_count(algorithm2),algorithm3=pairwise_count(algorithm3))
> alldf<-melt(alldf,id.vars=c("Point1","Point2"))
>
> #Get consensus probability that a pair are in the same set.
> library(dplyr)
> alldf %>% group_by(Point1,Point2) %>% summarize(n=n()/3)
# A tibble: 16 x 3
# Groups: Point1 [?]
Point1 Point2 n
<dbl> <dbl> <dbl>
1 1. 2. 1.00
2 1. 3. 1.00
3 2. 3. 1.00
4 4. 5. 0.333
5 4. 6. 1.00
6 5. 6. 0.333
7 5. 7. 0.667
8 5. 8. 0.667
9 5. 9. 0.333
10 5. 10. 0.333
11 7. 8. 1.00
12 7. 9. 0.667
13 7. 10. 0.667
14 8. 9. 0.667
15 8. 10. 0.667
16 9. 10. 1.00
>
> # How to choose final sets?
编辑#1 下面的代码重现了上面的函数。
library(reshape2)
library(dplyr)
algorithm1<-list(c(1,2,3),c(4,5,6),c(7,8,9,10))
algorithm2<-list(c(1,2,3),c(4,6),c(5,7,8,9,10))
algorithm3<-list(c(1,2,3),c(4,6),c(5,7,8),c(9,10))
pairwise_count<-function(x){
#For each group, get all pairwise combination of members
m<-lapply(x,function(y){
as.data.frame(t(combn(y,2)))
})
#Bind groups into a dataframe and give it a count column
df<-bind_rows(m)
colnames(df)<-c("Point1","Point2")
return(df)
}
#Example
pairwise_count(algorithm1)
#Compute for all algorithms
alldf<-list(algorithm1=pairwise_count(algorithm1),algorithm2=pairwise_count(algorithm2),algorithm3=pairwise_count(algorithm3))
alldf<-melt(alldf,id.vars=c("Point1","Point2"))
#Get consensus probability that a pair are in the same set.
alldf %>% group_by(Point1,Point2) %>% summarize(n=n()/3)
# How to choose final sets?