regex - 正则表达式匹配所有不是 4 位数的数字

Question

我匹配并替换前面和后面跟着空格的 4 位数字：

str12 <- "coihr 1234 &/()= jngm 34 ljd"
sub("\\s\\d{4}\\s", "", str12)
[1] "coihr&/()= jngm 34 ljd"

但是，每次尝试反转它并提取数字都会失败。我想：

[1] 1234

有人有线索吗？

ps：我知道如何用 {stringr} 做到这一点，但我想知道它是否可能只用 {base} ..

require(stringr)
gsub("\\s", "", str_extract(str12, "\\s\\d{4}\\s"))
[1] "1234"

score 6 · Accepted Answer

regmatches()regexpr，仅从 R-2.14.0 开始可用，允许您“从通过,gregexpr或获得的匹配数据中提取或替换匹配的子字符串regexec”

以下是如何使用示例regmatches()来提取输入字符串中的第一个空白缓冲 4 位子字符串或所有此类子字符串。

## Example strings and pattern
x <- "coihr 1234 &/()= jngm 34 ljd"          # string with 1 matching substring
xx <- "coihr 1234 &/()= jngm 3444  6789 ljd" # string with >1 matching substring
pat <- "(?<=\\s)(\\d{4})(?=\\s)"

## Use regexpr() to extract *1st* matching substring
as.numeric(regmatches(x, regexpr(pat, x, perl=TRUE)))
# [1] 1234
as.numeric(regmatches(xx, regexpr(pat, xx, perl=TRUE)))
# [1] 1234


## Use gregexpr() to extract *all* matching substrings
as.numeric(regmatches(xx, gregexpr(pat, xx, perl=TRUE))[[1]])
# [1] 1234 3444 6789

（请注意，这将返回numeric(0)不包含与您的条件匹配的子字符串的字符串）。

score 4 · Accepted Answer

可以使用 .regex 在正则表达式中捕获组()。举同样的例子

str12 <- "coihr 1234 &/()= jngm 34 ljd"
gsub(".*\\s(\\d{4})\\s.*", "\\1", str12)
[1] "1234"

score 0 · Accepted Answer

一般来说，我对正则表达式很天真，但这是在基础上做的一种丑陋的方式：

# if it's always in the same spot as in your example
unlist(strsplit(str12, split = " "))[2]

# or if it can occur in various places
str13 <- unlist(strsplit(str12, split = " "))
str13[!is.na(as.integer(str13)) & nchar(str13) == 4] # issues warning

regex - 正则表达式匹配所有不是 4 位数的数字

3 回答 3

Related

Reference