2

我有一个基因 ID 列表以及它们在 R 中的序列。

$2435
[1]"ATGCGGGCGGGGGTCGTCGA"

$2435
[1]"ATGCGGCGCGCGCGCTATATACGC"

$2435
[1]"ATGCGGCGCCTCTCATCGCGGGGG"

我想在 R 的列表中组合具有相同基因 ID 的序列。

$2435
[1]"ATGCGGGCGGGGGTCGTCGAATGCGGCGCGCGCGCTATATACGCATGCGGCGCCTCTCATCGCGGGGG"
4

3 回答 3

2
l <- list("A" = "ABC", "B" = "XYX", "A" = "DEF", "C" = "YZY", "A" = "GHI")
tapply(l, names(l), paste, collapse = "", simplify = FALSE)
# $A
# [1] "ABCDEFGHI"
# 
# $B
# [1] "XYX"
# 
# $C
# [1] "YZY"
于 2013-09-03T17:33:17.067 回答
2

奖金:

对于数据帧输出,请使用:

aggregate(unlist(A), by=list(id=names(A)), paste, collapse="")

你在哪里A列出。

使用@Ananda's A,我得到了这个:

  id                                       x
1 10                        FFFFGGGGHHHHIIII
2 12 AAAABBBBCCCCDDDDXXXXXXXXXXXXXXXXXXXXXXX
3 34                                    GGGG
于 2013-09-03T17:40:28.060 回答
2

lapply在将名称与 . 匹配后使用unique。以下是一些示例数据:

A <- list("12" = "AAAABBBBCCCCDDDD",
          "34" = "GGGG",
          "12" = "XXXXXXXXXXXXXXXXXXXXXXX",
          "10" = "FFFFGGGG",
          "10" = "HHHHIIII")
A
# $`12`
# [1] "AAAABBBBCCCCDDDD"
# 
# $`34`
# [1] "GGGG"
# 
# $`12`
# [1] "XXXXXXXXXXXXXXXXXXXXXXX"
# 
# $`10`
# [1] "FFFFGGGG"
# 
# $`10`
# [1] "HHHHIIII"

子集相关namespaste他们在一起。

lapply(unique(names(A)), function(x) paste(A[names(A) %in% x], collapse = ""))
# [[1]]
# [1] "AAAABBBBCCCCDDDDXXXXXXXXXXXXXXXXXXXXXXX"
# 
# [[2]]
# [1] "GGGG"
# 
# [[3]]
# [1] "FFFFGGGGHHHHIIII"
于 2013-09-03T17:28:29.937 回答