91

我正在使用 NCBI 参考序列登录号,例如变量a

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")  

要从 biomart 包中获取信息,我需要删除登录号后的 等.1.2我通常使用以下代码执行此操作:

b <- sub("..*", "", a)

# [1] "" "" "" "" "" ""

但正如您所见,这不是该变量的正确方法。谁能帮我这个?

4

4 回答 4

130

你只需要逃避这个时期:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")

gsub("\\..*","",a)
[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155" 
于 2012-05-16T14:43:27.017 回答
12

我们可以假装它们是文件名并删除扩展名

tools::file_path_sans_ext(a)
# [1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"
于 2017-06-14T14:07:10.547 回答
9

你可以这样做:

sub("*\\.[0-9]", "", a)

或者

library(stringr)
str_sub(a, start=1, end=-3)
于 2012-05-16T11:44:58.870 回答
8

如果字符串应该是固定长度的,那么可以使用substrfrom 。base R但是,我们可以得到.with的位置regexpr并将其用于substr

substr(a, 1, regexpr("\\.", a)-1)
#[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"   
于 2019-04-24T13:10:38.440 回答