3

我已经阅读了 R 中的一个 csv 文件,其中包含其他信息中的共同作者数据。该文件的作者列包含以下共同作者信息:

Miyazaki T., Akisawa A., Saha B.B., El-Sharkawy I.I., Chakraborty A.
Saha B.B., Chakraborty A., Koyama S., Aristov Y.I.
Ali S.M., Chakraborty A.
...

我想将此信息转换为具有以下形式的边缘列表:

Miyazaki T. Akisawa A.
Miyazaki T. Saha B.B.
Miyazaki T. El-Sharkawy I.I.
Miyazaki T. Chakraborty A.
Akisawa A.  Saha B.B.
Akisawa A. El-Sharkawy I.I.
Akisawa A.  Chakraborty A.
Saha B.B. El-Sharkawy I.I.
Saha B.B. Chakraborty A.
El-Sharkawy I.I. Chakraborty A.
Saha B.B. Chakraborty A.
Saha B.B. Koyama S.
....

基本上,网络是一个无向图。任何帮助/入门代码将不胜感激。此外,有没有办法保持合作的计数/频率(例如,Saha 在示例中与 Chakraborty 一起发表了两次)?

到目前为止我的代码:

data <- read.csv(file="Citations.csv", header=TRUE)
split_authors <- strsplit(as.character(data$Authors), ',')
head(split_authors,5)

[[1]]
[1] "Miyazaki T."       " Akisawa A."       " Saha B.B."        " El-     Sharkawy I.I." " Chakraborty A."  

[[2]]
[1] "Saha B.B."       " Chakraborty A." " Koyama S."      " Aristov Y.I."  

[[3]]
[1] "Ali S.M."        " Chakraborty A."

[[4]]
[1] "Myat A."         " Thu K."         " Kim Y.-D."      " Chakraborty A." " Chun W.G."      " Ng K.C."       

[[5]]
[1] "Baran S.B."       " Kandadai S."     " Anutosh C."      " Khairul H."      " Ibrahim E.-S.I." " Shigeru K."
4

1 回答 1

0

鉴于您的输入数据(dat在我的示例中)NA的缺失值小于每篇文章的最大作者数,您可以使用以下R代码:

# data 
dat <- rbind(c("Miyazaki T.", "Akisawa A.", "Saha B.B.", "El-Sharkawy I.I.", "Chakraborty A."),
             c("Saha B.B.", "Chakraborty A.", "Koyama S.", "Aristov Y.I.", NA),
             c("Ali S.M.", "Chakraborty A.", NA, NA, NA))

# loop through all rows of dat (all papers, I presume)
transformed.dat <- lapply(1:nrow(dat), function(row.num) {

  row.el <- dat[row.num, ] # the row element that will be used in this loop

  # number of authors per paper
  n.authors <- length(row.el[!is.na(row.el)])

  # creates a matrix with all possible combinations (play around with n.authors, to see what it does)
  pairings <- combn(n.authors, 2)

 # loop through all pairs and return a vector with one row and two columns
  res <- apply(pairings, 2, function(vec) {
    return(t(row.el[vec]))
  })

  # create a data.frame with names aut1 and aut2
  res <- data.frame(aut1 = res[1, ],
                    aut2 = res[2, ])

  return(res)
})

# use data.table's rbindlist to bind the list of combinations together
final.dat <- data.table::rbindlist(transformed.dat)

final.dat
#         aut1             aut2
# 1:      Miyazaki T.       Akisawa A.
# 2:      Miyazaki T.        Saha B.B.
# 3:      Miyazaki T. El-Sharkawy I.I.
# 4:      Miyazaki T.   Chakraborty A.
# 5:       Akisawa A.        Saha B.B.
# 6:       Akisawa A. El-Sharkawy I.I.
# 7:       Akisawa A.   Chakraborty A.
# 8:        Saha B.B. El-Sharkawy I.I.
# 9:        Saha B.B.   Chakraborty A.
# 10: El-Sharkawy I.I.   Chakraborty A.
# 11:        Saha B.B.   Chakraborty A.
# 12:        Saha B.B.        Koyama S.
# 13:        Saha B.B.     Aristov Y.I.
# 14:   Chakraborty A.        Koyama S.
# 15:   Chakraborty A.     Aristov Y.I.
# 16:        Koyama S.     Aristov Y.I.
# 17:         Ali S.M.   Chakraborty A.

这满足你的问题吗?关键是combn创建可能组合的 -function

于 2015-11-05T10:27:37.943 回答