string - 获取序列的特定子字符串

Question

我在 R 中创建了以下矩阵：

positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4))

我还有以下字符串：

"SEQRES   1 L   36  THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO          "

我正在尝试使用应用函数来制作一个子字符串列表（mystring，start.position，end.position），其中第一个索引来自位置 [，1]，第二个来自位置 [，2]。我可以使用 for 循环轻松完成此操作，但我认为 apply 会更快。

我可以让它按如下方式工作，但我想知道是否有更清洁的方法：

parse.me = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4), input)
apply(parse.me, MARGIN = 1, get.AA.seqres)

get.AA.seqres <- function(items){
start.position = as.numeric(items[1])
end.position = as.numeric(items[2])
string = items[3]
return (substr(string, start.position, end.position)  )
}

score 3 · Accepted Answer

尝试这个：

> substring(input, positions[, 1], positions[, 2])
 [1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"

score 0 · Accepted Answer

我喜欢 Andrie 的实用建议，但如果您出于其他原因需要走这条路，您的问题听起来可以通过以下方式解决Vectorize()：

#Your data
positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4))
input <- "SEQRES   1 L   36  THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO          "

#Vectorize the function substr()
vsubstr <- Vectorize(substr, USE.NAMES = FALSE)
vsubstr(input, positions[,1], positions[,2])
#-----
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"

#Or, read the help page on ?substr about the bit for recycling in the first paragraph of details

substr(rep(input, nrow(positions)), positions[,1], positions[,2])
#-----
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"

string - 获取序列的特定子字符串

2 回答 2

Related

Reference