0

我有一个包含数千个下三角矩阵的文件(一个在另一个之下):

1|Gene1_PRT1                
2|Gene2_PRT1    0           
2|Gene3_PRT1    0   0       
1|Gene7_PRT1    1.4287  1.4287  1.5293  
2|Gene9_PRT1    1.4428  1.4428  1.5293  0

2|Gene90_PRT1       
1|Gene60_PRT1   1.6242  
2|Gene26454_PRT1    -1  -1

我需要一个列表/表格,左侧有成对(基因)名称和值(对角线,0 与自身的比较被删除)。喜欢:

2|Gene68760_PRT1    1|Gene32540_PRT1    0
2|Gene99122_PRT1    1|Gene32540_PRT1    0
1|Gene2362_PRT1     1|Gene32540_PRT1    1.4287
2|Gene63993_PRT1    1|Gene32540_PRT1    1.4428
2|Gene99122_PRT1    2|Gene68760_PRT1    0
1|Gene2362_PRT1     2|Gene68760_PRT1    1.4287
2|Gene63993_PRT1    2|Gene68760_PRT1    1.4428
1|Gene2362_PRT1     2|Gene99122_PRT1    1.5293
2|Gene63993_PRT1    2|Gene99122_PRT1    1.5293
2|Gene63993_PRT1    1|Gene2362_PRT1     0

我尝试了一些简单的 grep 等函数,我有一个值列表,但左侧没有成对名称。我是(生物)信息学的新手,正在努力学习......

4

1 回答 1

0

不知道这是否足够快:

#read the data
dat <- readLines(textConnection("1|Gene1_PRT1                
2|Gene2_PRT1    0           
2|Gene3_PRT1    0   0       
1|Gene7_PRT1    1.4287  1.4287  1.5293  
2|Gene9_PRT1    1.4428  1.4428  1.5293  0

2|Gene90_PRT1       
1|Gene60_PRT1   1.6242  
2|Gene26454_PRT1    -1  -1"))

#split the data using the fact that there are empty rows
dat <- split(dat[dat!=""],cumsum(dat=="")[dat!=""])

#split the rows
dat <- lapply(dat,strsplit,split=" +")

#create matrices with lower triangles and melt them
library(reshape2)    
dat <- lapply(dat,function(x) {
  mat <- matrix(ncol=length(x),nrow=length(x))
  nam <- do.call(c,lapply(x,function(y) y[1]))
  rownames(mat) <- nam
  colnames(mat) <- nam

  mat[upper.tri(mat)] <- do.call(c,lapply(x,function(y) as.numeric(y[-1])))
  na.omit(melt(t(mat)))
})

#rbind everything together
do.call(rbind,dat)


#                  Var1          Var2   value
# 0.2      2|Gene2_PRT1  1|Gene1_PRT1  0.0000
# 0.3      2|Gene3_PRT1  1|Gene1_PRT1  0.0000
# 0.4      1|Gene7_PRT1  1|Gene1_PRT1  1.4287
# 0.5      2|Gene9_PRT1  1|Gene1_PRT1  1.4428
# 0.8      2|Gene3_PRT1  2|Gene2_PRT1  0.0000
# 0.9      1|Gene7_PRT1  2|Gene2_PRT1  1.4287
# 0.10     2|Gene9_PRT1  2|Gene2_PRT1  1.4428
# 0.14     1|Gene7_PRT1  2|Gene3_PRT1  1.5293
# 0.15     2|Gene9_PRT1  2|Gene3_PRT1  1.5293
# 0.20     2|Gene9_PRT1  1|Gene7_PRT1  0.0000
# 1.2     1|Gene60_PRT1 2|Gene90_PRT1  1.6242
# 1.3  2|Gene26454_PRT1 2|Gene90_PRT1 -1.0000
# 1.6  2|Gene26454_PRT1 1|Gene60_PRT1 -1.0000
于 2013-07-10T08:58:18.713 回答