1

我想使用 igraph 创建一个共同作者网络。

我的数据组织在一个 data.frame 中,如下所示:

DF1 <- cbind(Papers =  paste('Paper', 1:5, sep = ''),
             Author1 = c('A', 'D', 'C', 'C', 'C'),
             Author2 = c('B', 'C', 'F', NA, 'F'),
             Author3 = c('C', 'E', NA, NA, 'D'))

我想创建一个如下所示的边缘列表:

   Vertex1 Vertex2
        A       B
        D       C
        C       F
        C       F
        A       C
        D       E
        C       D
        B       C
        C       E
        F       D

无论如何在R中这样做(例如igraph)

以下函数可以解决问题,但对于大型数据集(超过 5,000 篇论文),运行时间太长

Fun_DFtoEdgeList <- function (Inputdataframe)
{

  ## This function create an edge list to create a network
  ## Input : Dataframe with UNIQUE VALUES !!!!

  ResEdgeList <- data.frame(Vertex1 = c('--'), Vertex2 = c('--'))


  for (i in 1 : (ncol(Inputdataframe)-1))
  {
    for (j in 2: (ncol(Inputdataframe)))
    {
      if (i !=j)     
      {
        #print(paste(i, j, sep ='--'))

        ToAppend <- data.frame(cbind(Inputdataframe[,i], Inputdataframe[,j]))
        names(ToAppend) <- names(ResEdgeList)
        #print(ToAppend)

        ResEdgeList <- rbind(ResEdgeList, ToAppend)
      }
    }

  }

  ResEdgeList <- data.frame(ResEdgeList[-1,], stringsAsFactors = FALSE)
  ResEdgeList<- subset(ResEdgeList, (is.na(Vertex1) == FALSE ) & (is.na(Vertex2) == FALSE ))  
  ResEdgeList
}


Fun_DFtoEdgeList (DF1[,-1])

`` 任何帮助表示赞赏。(我之前曾在不同的标题下发布过这个问题,但被告知我不够清楚)

4

2 回答 2

3

您的代码不会生成您提供的数据,因为它正在遍历“Paper”列。它也会被证明很慢,因为每次您附加到现有对象时,R 都必须获取整个对象的另一个副本……当您迭代地执行此操作时,事情会变得缓慢。查看您的输出,我认为这是您想要的:

#First, creat all combos of the columns you want. I don't think you want to include the "Paper" column?

x <- combn(2:4,2)
#-----
     [,1] [,2] [,3]
[1,]    2    2    3
[2,]    3    4    4

#next use apply to go through each pair:
apply(x, 2, function(z) data.frame(Vertex1 = DF1[, z[1]], Vertex2 = DF1[, z[2]]))
#-----
[[1]]
  Vertex1 Vertex2
1       A       B
2       D       C
3       C       F
4       C    <NA>
5       C       F
....
#So use do.call to rbind them together:

out <- do.call("rbind", 
        apply(x, 2, function(z) data.frame(Vertex1 = DF1[, z[1]], Vertex2 = DF1[, z[2]])))

#Finally, filter out the rows with NA:
out[complete.cases(out),]
#-----
   Vertex1 Vertex2
1        A       B
2        D       C
3        C       F
5        C       F
6        A       C
7        D       E
10       C       D
11       B       C
12       C       E
15       F       D

最后,看看这如何扩展到更大的问题:

#Just over a million papers
zz <- matrix(sample(letters, 1000002, TRUE), ncol = 3)
x <- combn(1:3, 2)
system.time(do.call("rbind", 
                    apply(x, 2, function(z) data.frame(Vertex1 = zz[, z[1]], Vertex2 = zz[, z[2]]))))
#-----
user  system elapsed 
  1.332   0.144   1.482

1.5秒对我来说似乎很合理?

于 2012-06-30T15:30:56.980 回答
1

可能有更好的方法来做到这一点,但是 try combn,它会产生所有独特的组合:

DF1 <- cbind(Papers =  paste('Paper', 1:5, sep = ''),
             Author1 = c('A', 'D', 'C', 'C', 'C'),
             Author2 = c('B', 'C', 'F', NA, 'F'),
             Author3 = c('C', 'E', NA, NA, 'D'))

require(igraph)
l=apply(DF1[,-1],MARGIN=1,function(x) na.omit(data.frame(t(combn(x,m=2)))))
df=do.call(rbind,l)
g=graph.data.frame(df,directed=F)
plot(g)
于 2012-06-30T15:37:09.523 回答