regex - 使用哪个正则表达式在 R 中的 stri_regex 中提取适当的信息？

Question

我正在尝试gdac.broadinstitute.org_在 R 中的该字符中提取该单词之后的名称

element <- "<li><a href=\"gdac.broadinstitute.org_BRCA.miRseq_Preprocess.mage-tab.2015020400.0.0.tar.gz.md5\"> gdac.broadinstitute.org_BRCA.miRseq_Preprocess.mage-tab.2015020400.0.0.tar.gz.md5</a></li>"

我stri_extract从stringi包中使用，但看起来我对正则表达式不太了解。我试过这样的事情：

stri_extract( element, 
                      regex  = "gdac.broadinstitute.org_")

任何人都可以帮忙吗？

score 2 · Accepted Answer

试试这个：

stri_extract_first_regex( element, "(?<=gdac.broadinstitute.org_)[\\w\\.-]+")

通常，使用 regex ，您可以在 expression 之后(?<=start)[set]+提取所有内容（所有匹配项）。有关 ICU 正则表达式的更多信息：http: //userguide.icu-project.org/strings/regexpsetstart

score 1 · Accepted Answer

我不熟悉stringi，但可以gsub轻松使用。我可以得到名称的结尾，所以我假设名称是下划线之后的所有内容"

gsub(".*gdac.broadinstitute.org_(.*)\".*", "\\1", element)

regex - 使用哪个正则表达式在 R 中的 stri_regex 中提取适当的信息？

2 回答 2

Related

Reference