211

我看过这个例子:

hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//[0-9]/}

遵循以下语法:${variable//pattern/replacement}

不幸的是,该pattern字段似乎不支持完整的正则表达式语法(例如,如果我使用.or \s,它会尝试匹配文字字符)。

如何使用完整的正则表达式语法搜索/替换字符串?

4

9 回答 9

203

使用sed

MYVAR=ho02123ware38384you443d34o3434ingtod38384day
echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g'
# prints XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

请注意,后续-e的 ' 是按顺序处理的。此外,g表达式的标志将匹配输入中的所有匹配项。

您也可以使用此方法选择您最喜欢的工具,即 perl、awk,例如:

echo "$MYVAR" | perl -pe 's/[a-zA-Z]/X/g and s/[0-9]/N/g'

and这可能使您可以进行更多创造性的匹配...例如,在上面的片段中,除非第一个表达式匹配(由于惰性求值),否则不会使用数字替换。当然,你有 Perl 的完整语言支持来做你的竞标......

于 2012-10-24T05:16:53.663 回答
155

这实际上可以在纯 bash 中完成:

hello=ho02123ware38384you443d34o3434ingtod38384day
re='(.*)[0-9]+(.*)'
while [[ $hello =~ $re ]]; do
  hello=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
done
echo "$hello"

...产量...

howareyoudoingtodday
于 2014-03-07T21:55:27.707 回答
121

这些示例也可以在 bash 中使用,无需使用 sed:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[a-zA-Z]/X} 
echo ${MYVAR//[0-9]/N}

您还可以使用字符类括号表达式

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[[:alpha:]]/X} 
echo ${MYVAR//[[:digit:]]/N}

输出

XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

但是,@Lanaru 想知道的是,如果我正确理解了这个问题,为什么“完整”或 PCRE 扩展\s\S\w\W\d\D等不能像 php ruby​​ python 等那样工作。这些扩展来自与 Perl 兼容的正则表达式 (PCRE) 和可能与其他形式的基于 shell 的正则表达式不兼容。

这些不起作用:

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//\d/}


#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | sed 's/\d//g'

删除所有文字“d”字符的输出

ho02123ware38384you44334o3434ingto38384ay

但以下确实按预期工作

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | perl -pe 's/\d//g'

输出

howareyoudoingtodday

希望能更清楚地说明问题,但如果您还不感到困惑,为什么不在启用了 REG_ENHANCED 标志的 Mac OS X 上尝试一下:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day;
echo $MYVAR | grep -o -E '\d'

在大多数 *nix 版本中,您只会看到以下输出:

d
d
d

开心!

于 2014-03-07T21:48:36.263 回答
14

如果您正在重复调用并且关心性能,该测试表明 BASH 方法比分叉到 sed 和任何其他外部进程快约 15 倍。

hello=123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X

P1=$(date +%s)

for i in {1..10000}
do
   echo $hello | sed s/X//g > /dev/null
done

P2=$(date +%s)
echo $[$P2-$P1]

for i in {1..10000}
do
   echo ${hello//X/} > /dev/null
done

P3=$(date +%s)
echo $[$P3-$P2]
于 2017-01-05T21:32:18.623 回答
10

使用[[:digit:]](注意双括号)作为模式:

$ hello=ho02123ware38384you443d34o3434ingtod38384day
$ echo ${hello//[[:digit:]]/}
howareyoudoingtodday

只是想总结一下答案(尤其是@nickl-'s https://stackoverflow.com/a/22261334/2916086)。

于 2016-08-30T02:25:26.547 回答
6

我知道这是一个古老的线程,但这是我在 Google 上的第一次点击,我想分享以下resub我整理的内容,它增加了对多个 $1、$2 等反向引用的支持......

#!/usr/bin/env bash

############################################
###  resub - regex substitution in bash  ###
############################################

resub() {
    local match="$1" subst="$2" tmp

    if [[ -z $match ]]; then
        echo "Usage: echo \"some text\" | resub '(.*) (.*)' '\$2 me \${1}time'" >&2
        return 1
    fi

    ### First, convert "$1" to "$BASH_REMATCH[1]" and 'single-quote' for later eval-ing...

    ### Utility function to 'single-quote' a list of strings
    squot() { local a=(); for i in "$@"; do a+=( $(echo \'${i//\'/\'\"\'\"\'}\' )); done; echo "${a[@]}"; }

    tmp=""
    while [[ $subst =~ (.*)\${([0-9]+)}(.*) ]] || [[ $subst =~ (.*)\$([0-9]+)(.*) ]]; do
        tmp="\${BASH_REMATCH[${BASH_REMATCH[2]}]}$(squot "${BASH_REMATCH[3]}")${tmp}"
        subst="${BASH_REMATCH[1]}"
    done
    subst="$(squot "${subst}")${tmp}"

    ### Now start (globally) substituting

    tmp=""
    while read line; do
        counter=0
        while [[ $line =~ $match(.*) ]]; do
            eval tmp='"${tmp}${line%${BASH_REMATCH[0]}}"'"${subst}"
            line="${BASH_REMATCH[$(( ${#BASH_REMATCH[@]} - 1 ))]}"
        done
        echo "${tmp}${line}"
    done
}

resub "$@"

##################
###  EXAMPLES  ###
##################

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub quick slow
###    The slow brown fox jumps slowly over the lazy dog

###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub 'quick ([^ ]+) fox' 'slow $1 sheep'
###    The slow brown sheep jumps quickly over the lazy dog

###  % animal="sheep"
###  % echo "The quick brown fox 'jumps' quickly over the \"lazy\" \$dog" | resub 'quick ([^ ]+) fox' "\"\$low\" \${1} '$animal'"
###    The "$low" brown 'sheep' 'jumps' quickly over the "lazy" $dog

###  % echo "one two three four five" | resub "one ([^ ]+) three ([^ ]+) five" 'one $2 three $1 five'
###    one four three two five

###  % echo "one two one four five" | resub "one ([^ ]+) " 'XXX $1 '
###    XXX two XXX four five

###  % echo "one two three four five one six three seven eight" | resub "one ([^ ]+) three ([^ ]+) " 'XXX $1 YYY $2 '
###    XXX two YYY four five XXX six YYY seven eight

H/T 致@Charles Duffy 回复(.*)$match(.*)

于 2020-07-24T01:29:52.273 回答
1

设置变量

hello=ho02123ware38384you443d34o3434ingtod38384day

然后,在 var 上使用正则表达式替换回显

echo ${hello//[[:digit:]]/}

这将打印:

howareyoudoingtodday

额外 - 如果你想要相反的(获取数字字符)

echo ${hello//[![:digit:]]/}

这将打印:

021233838444334343438384
于 2021-08-12T14:01:35.363 回答
0

此示例在输入hello ugly world中搜索正则表达式bad|ugly并将其替换为nice

#!/bin/bash

# THIS FUNCTION NEEDS THREE PARAMETERS
# arg1 = input              Example:  hello ugly world
# arg2 = search regex       Example:  bad|ugly
# arg3 = replace            Example:  nice
function regex_replace()
{
  # $1 = hello ugly world
  # $2 = bad|ugly
  # $3 = nice

  # REGEX
  re="(.*?)($2)(.*)"

  if [[ $1 =~ $re ]]; then
    # if there is a match
    
    # ${BASH_REMATCH[0]} = hello ugly world
    # ${BASH_REMATCH[1]} = hello 
    # ${BASH_REMATCH[2]} = ugly
    # ${BASH_REMATCH[3]} = world    

    # hello + nice + world
    echo ${BASH_REMATCH[1]}$3${BASH_REMATCH[3]}
  else    
    # if no match return original input  hello ugly world
    echo "$1"
  fi    
}

# prints 'hello nice world'
regex_replace 'hello ugly world' 'bad|ugly' 'nice'

# to save output to a variable
x=$(regex_replace 'hello ugly world' 'bad|ugly' 'nice')
echo "output of replacement is: $x"
exit
于 2020-11-14T16:43:03.093 回答
0

你可以使用蟒蛇。这效率不高,但可以使用更灵活的语法完成工作。

申请备案

以下 pythonscript 将用“TO”替换“FROM”(但不是“notFrom”)。

正则表达式_replace.py

import sys
import re

for line in sys.stdin:
    line = re.sub(r'(?<!not)FROM', 'TO', line)
    sys.stdout.write(line)

您可以将其应用于文本文件,例如

$ cat test.txt
bla notFROM
FROM FROM
bla bla
FROM bla

bla  notFROM FROM

bla FROM
bla bla


$ cat test.txt | python regex_replace.py
bla notFROM
TO TO
bla bla
TO bla

bla  notFROM TO

bla TO
bla bla

应用于变量

#!/bin/bash

hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello

PYTHON_CODE=$(cat <<END
import sys
import re

for line in sys.stdin:
    line = re.sub(r'[0-9]', '', line)
    sys.stdout.write(line)
END
)
echo $hello | python -c "$PYTHON_CODE"

输出

ho02123ware38384you443d34o3434ingtod38384day
howareyoudoingtodday
于 2021-10-27T09:11:01.240 回答