你可以结合regexpr
和substr
:
TEXT <- c("tedSTXDXIKsslker","janetlkajsdfSTXDXIKalkse","maggiesdfes","sdfjkSTXDXIKryan")
r <- regexpr("ST[A-z]D[A-z]IK", TEXT)
s <- substr(TEXT, r, r+attr(r, "match.length")-1)
s
# [1] "STXDXIK" "STXDXIK" "" "STXDXIK"
如果你想过滤""
你可以使用:
s <- s[nchar(s)>0]
# [1] "STXDXIK" "STXDXIK" "STXDXIK"
编辑:添加gregexpr
示例
TEXT <- c("tedSTXDXIKsslker","janetlkajsdfSTXDXIKalkse","maggiesdfes","sdfjkSTXDXIKryan",
"sdfjkSTXDXIKryansdfjkSTXDXIKryan")
## use gregexpr instead of regexpr
r <- gregexpr("ST[A-z]D[A-z]IK", TEXT)
## because gregexpr returns a list, we have to use mapply (or a for loop)
## please note: I use substring instead of substr here because substr returns only a vector of the same size as the input vector.
mapply(FUN=function(str, rx)substring(str, rx, rx+attr(rx, "match.length")-1), str=TEXT, rx=r)
# $tedSTXDXIKsslker
# [1] "STXDXIK"
#
# $janetlkajsdfSTXDXIKalkse
# [1] "STXDXIK"
#
# $maggiesdfes
# [1] ""
#
# $sdfjkSTXDXIKryan
# [1] "STXDXIK"
#
# $sdfjkSTXDXIKryansdfjkSTXDXIKryan
# [1] "STXDXIK" "STXDXIK"