regex - 如何修剪和替换字符串

Question

string<-c("       this is a string  ")

是否可以在字符串的两侧（或根据需要仅在一侧）修剪空格并将其替换为 R 中所需的字符，例如这个？字符串每一侧的空格数不同，并且必须在替换时保留。

"~~~~~~~this is a string~~"

score 6 · Accepted Answer

使用gsub：

gsub(" ", "~", "    this is a string  ")
[1] "~~~~this~is~a~string~~"

此函数使用正则表达式替换（即子）字符串中所有出现的模式。

在您的情况下，您必须以特殊方式表达模式：

gsub("(^ *)|( *$)", "~~~", "    this is a string  ")
[1] "~~~this is a string~~~"

图案的意思是：

(^ *):在字符串的开头找到一个或多个空格
( *$):在字符串末尾找到一个或多个空格
`|: OR 运算符

现在您可以使用这种方法来解决用新字符替换每个空格的问题：

txt <- "    this is a string  "
foo <- function(x, new="~"){
  lead <- gsub("(^ *).*", "\\1", x)
  last <- gsub(".*?( *$)", "\\1", x)
  mid  <- gsub("(^ *)|( *$)", "", x)
  paste0(
    gsub(" ", new, lead),
    mid,
    gsub(" ", new, last)
  )
}

> foo("    this is a string  ")
[1] "~~~~this is a string~~"

> foo(" And another one        ")
[1] "~And another one~~~~~~~~"

有关更多信息，请参阅?gsub或?regexp。

score 6 · Accepted Answer

这似乎是一种效率低下的方法，但也许您应该朝着 and 而不是的gregexpr方向regmatches寻找gsub：

x <- "    this is a string  "
pattern <- "^ +?\\b|\\b? +$"
startstop <- gsub(" ", "~", regmatches(x, gregexpr(pattern, x))[[1]])
text <- paste(regmatches(x, gregexpr(pattern, x), invert=TRUE)[[1]], collapse="")
paste0(startstop[1], text, startstop[2])
# [1] "~~~~this is a string~~"

而且，为了好玩，作为一个函数和一个“矢量化”函数：

## The function
replaceEnds <- function(string) {
  pattern <- "^ +?\\b|\\b? +$"
  startstop <- gsub(" ", "~", regmatches(string, gregexpr(pattern, string))[[1]])
  text <- paste(regmatches(string, gregexpr(pattern, string), invert = TRUE)[[1]],
                collapse = "")
  paste0(startstop[1], text, startstop[2])
}

## use Vectorize here if you want to apply over a vector
vReplaceEnds <- Vectorize(replaceEnds)

一些样本数据：

myStrings <- c("    Four at the start, 2 at the end  ", 
               "   three at the start, one at the end ")

vReplaceEnds(myStrings)
#        Four at the start, 2 at the end        three at the start, one at the end  
#  "~~~~Four at the start, 2 at the end~~" "~~~three at the start, one at the end~"

score 6 · Accepted Answer

或者使用更复杂的模式匹配和gsub...

gsub("\\s(?!\\b)|(?<=\\s)\\s(?=\\b)", "~", "    this is a string  " , perl = TRUE )
#[1] "~~~~this is a string~~"

或者使用@AnandaMahto 的数据：

gsub("\\s(?!\\b)|(?<=\\s)\\s(?=\\b)", "~", myStrings , perl = TRUE )
#[1] "~~~~Four at the start, 2 at the end~~" 
#[2] "~~~three at the start, one at the end~"

解释

这使用正面和负面的前瞻和后瞻断言：

\\s(?!\\b)- 匹配一个空格，\\s后面不跟单词边界，(?!\\b). 除了第一个单词之前的最后一个空格之外，这对所有内容都有效，即我们自己会得到
"~~~~ this is a string~~". 所以我们需要另一种模式...
(?<=\\s)\\s(?=\\b)- 匹配一个空格，\\s前面是另一个空格，(?<=\\s)后面是单词边界，(?=\\b).

因此它gsub会尝试尽可能多地进行匹配。

regex - 如何修剪和替换字符串

3 回答 3

解释

Related

Reference