regex - 在两个可能的分隔符之一之前查找一个单词

Question

word:12335
anotherword:2323434
totallydifferentword/455
word/32

:我需要在或/仅使用基本 R 函数之前获取字符串。我可以使用stringr但不想在我的包中添加另一个依赖项来做到这一点。单词可以有可变数量的字符，但总是以分隔符（之一）结束。我不需要保留后面的内容。

score 3 · Accepted Answer

也许尝试：

x <- c("word:12335", "anotherword:2323434", "totallydifferentword/455", "word/32")
lapply(strsplit(x, ":|/"), function(z) z[[1]]) #as a list
sapply(strsplit(x, ":|/"), function(z) z[[1]]) #as a string

有一些正则表达式解决方案也可以使用gsub，但根据我遇到类似问题的经验，它strsplit会不那么雄辩，但速度更快。

我想这个正则表达式也可以工作：

gsub("([a-z]+)([/|:])([0-9]+)", "\\1", x)

在这种情况下 gsub 更快：

Unit: microseconds
        expr    min     lq median     uq     max
1     GSUB() 19.127 21.460 22.392 23.792 106.362
2 STRSPLIT() 46.650 50.849 53.182 54.581 854.162

score 2 · Accepted Answer

像这样的东西可以在 Ruby http://rubular.com/r/PzVQVIpKPq

^(\w+)(?:[:\/])

从字符串的前面开始，抓取任何单词字符并捕获它们，直到到达非捕获/或:

score 0 · Accepted Answer

0

这个正则表达式似乎工作。你可以在R中使用它吗？

于 2012-10-02T16:21:10.813 回答

score 0 · Accepted Answer

您可以使用软件包unglie：

library(unglue)
x <- c("word:12335", "anotherword:2323434", "totallydifferentword/455", "word/32")
unglue_vec(x, "{res}{=[:/].*?}")
#> [1] "word"                 "anotherword"          "totallydifferentword"
#> [4] "word"

^{由reprex 包（v0.3.0）于 2019 年 10 月 8 日创建}

{res}匹配任何东西并将被返回，它相当于{res=.*?}
{=[:/].*?}:匹配以or开头的任何内容，/并且不会返回，因为我们没有 lhs=

regex - 在两个可能的分隔符之一之前查找一个单词

4 回答 4

Related

Reference