11

我对 R 中的正则表达式有一个奇怪的请求。我有字符串向量,其中一些有多个尾随句点。我想用空格替换这些句点。示例和期望的结果应该清楚我所追求的(也许我需要用我给出的替换参数而不是模式参数来攻击它gsub):

示例和尝试:

x <- c("good", "little.bad", "really.ugly......")
gsub("\\.$", " ", x)
  #produces this
  #[1] "good"              "little.bad"        "really.ugly..... "
gsub("\\.+$", " ", x)
  #produces this
  #[1] "good"         "little.bad"   "really.ugly "

期望的结果

[1] "good"              "little.bad"        "really.ugly      "

所以原始向量 (x) 的最后一个字符串末尾有 6 个句点,所以我想要 6 个空格,而不触及真的和丑陋之间的句点。我知道最后的$样子,但无法超越这一点。

4

3 回答 3

17

尝试这个:

gsub("\\.(?=\\.*$)", " ", mystring, perl=TRUE)

解释:

\.   # Match a dot
(?=  # only if followed by
 \.* # zero or more dots
 $   # until the end of the string
)    # End of lookahead assertion.
于 2012-08-31T21:33:25.777 回答
2

蒂姆的解决方案显然更好,但我想我会尝试另一种方式。使用自由使用regmatches帮助我们在这里

x <- c("good", "little.bad", "really.ugly......")
# Get an object with 'match data' to feed into regmatches
# Here we match on any number of periods at the end of a string
out <- regexpr("\\.*$", x)

# On the right hand side we extract the pieces of the strings
# that match our pattern with regmatches and then replace
# all the periods with spaces.  Then we use assignment
# to store that into the spots in our strings that match the
# regular expression.
regmatches(x, out) <- gsub("\\.", " ", regmatches(x, out))
x
#[1] "good"              "little.bad"        "really.ugly      "

所以不像单个正则表达式那么干净。但是我从来没有真正开始学习 perl 正则表达式中的那些“前瞻”。

于 2012-08-31T22:04:23.283 回答
2

当我等待一个有意义的正则表达式解决方案时,我决定想出一种荒谬的方法来解决这个问题:

messy.sol <- function(x) {
paste(unlist(list(gsub("\\.+$", "", x), 
    rep(" ", nchar(x) -  nchar(gsub("\\.+$", "", x))))),collapse="")
}

sapply(x, messy.sol, USE.NAMES = FALSE)

我会说蒂姆的有点漂亮:)

于 2012-08-31T21:41:44.837 回答