7

我真的在花时间学习正则表达式,并且正在玩不同的玩具场景。我无法开始工作的一种设置是从字符串的开头抓取到n > 1的字符的 n 次出现。

在这里,我可以从字符串的开头抓取到第一个下划线,但我不能将其概括为第二个或第三个下划线。

x <- c("a_b_c_d", "1_2_3_4", "<_?_._:")

gsub("_.*$", "", x)

Here's what I'm trying to achieve with regex. (`sub`/`gsub`):

## > sapply(lapply(strsplit(x, "_"), "[", 1:2), paste, collapse="_")
## [1] "a_b" "1_2" "<_?"

#or

## > sapply(lapply(strsplit(x, "_"), "[", 1:3), paste, collapse="_")
## [1] "a_b_c" "1_2_3" "<_?_."

相关文章:从第一个字符到字符串结尾的正则表达式

4

5 回答 5

5

这是一个开始。为了使它安全地用于一般用途,您需要它正确地转义正则表达式的特殊字符:

x <- c("a_b_c_d", "1_2_3_4", "<_?_._:", "", "abcd", "____abcd")

matchToNth <- function(char, n) {
    others <- paste0("[^", char, "]*") ## matches "[^_]*" if char is "_"
    mainPat <- paste0(c(rep(c(others, char), n-1), others), collapse="")
    paste0("(^", mainPat, ")", "(.*$)")
}

gsub(matchToNth("_", 2), "\\1", x)
# [1] "a_b"  "1_2"  "<_?"  ""     "abcd" "_" 

gsub(matchToNth("_", 3), "\\1", x)
# [1] "a_b_c" "1_2_3" "<_?_." ""      "abcd"  "__"   
于 2013-04-09T18:48:37.410 回答
3

怎么样:

gsub('^(.+_.+?).*$', '\\1', x)
# [1] "a_b" "1_2" "<_?"

或者,您可以使用{}来指示重复次数...

sub('((.+_){1}.+?).*$', '\\1', x)  # {0} will give "a", {1} - "a_b", {2} - "a_b_c" and so on

所以如果你想匹配第n个,你不必重复自己......

于 2013-04-09T18:31:09.590 回答
1

perl 风格正则表达式中的第二个下划线:

/^(.?_.?_)/

第三:

/^(.*?_.*?_.*?_)/
于 2013-04-09T18:29:30.417 回答
1

也许是这样的

x
## [1] "a_b_c_d" "1_2_3_4" "<_?_._:"

gsub("(.*)_", "\\1", regmatches(x, regexpr("([^_]*_){1}", x)))
## [1] "a" "1" "<"

gsub("(.*)_", "\\1", regmatches(x, regexpr("([^_]*_){2}", x)))
## [1] "a_b" "1_2" "<_?"

gsub("(.*)_", "\\1", regmatches(x, regexpr("([^_]*_){3}", x)))
## [1] "a_b_c" "1_2_3" "<_?_."
于 2013-04-09T18:52:00.983 回答
1

使用贾斯汀的方法,这是我设计的:

beg2char <- function(text, char = " ", noc = 1, include = FALSE) {
    inc <- ifelse(include, char, "?")
    specchar <- c(".", "|", "(", ")", "[", "{", "^", "$", "*", "+", "?")
    if(char %in% specchar) {
        char <- paste0("\\", char)
    }
    ins <- paste(rep(paste0(char, ".+"), noc - 1), collapse="")
    rep <- paste0("^(.+", ins, inc, ").*$")
    gsub(rep, "\\1", text)
}

x <- c("a_b_c_d", "1_2_3_4", "<_?_._:")
beg2char(x, "_", 1)
beg2char(x, "_", 2)
beg2char(x, "_", 3)
beg2char(x, "_", 4)
beg2char(x, "_", 3, include=TRUE)
于 2013-04-09T19:26:50.490 回答