r - 试图从R中的基因序列中返回指定数量的字符

Question

我有一个 DNA 序列，如：cgtcgctgtttgtcaaagtcg....

这可能是 1000 多个字母长。

但是，例如，我只想查看字母 5 到 200，并将字符串的这个子集定义为新对象。

我尝试查看该nchar功能，但没有找到可以做到这一点的东西。

score 9 · Accepted Answer

9

尝试

substr("cgtcgctgtttgtcaa[...]", 5, 200)

见substr()。

于 2009-09-28T23:15:16.643 回答

score 6 · Accepted Answer

使用子字符串函数：

> tmp.string <- paste(LETTERS, collapse="")
> tmp.string <- substr(tmp.string, 4, 10)
> tmp.string
[1] "DEFGHIJ"

score 3 · Accepted Answer

如果您需要处理大型生物序列或序列集，另请参阅 Bioconductor 包Biostrings是一个不错的选择。

#source("http://bioconductor.org/biocLite.R");biocLite("Biostrings") 
library(Biostrings)
s <-paste(rep("gtcgctgtttgtcaac",20),collapse="")
d <- DNAString(s)
d[5:200]
as.character(d[5:200])

r - 试图从R中的基因序列中返回指定数量的字符

3 回答 3

Related

Reference