regex - 使用 sed 替换等长的文本

Question

有没有办法使用 sed 替换具有相同长度的其他东西（例如点、零等）的模式？像这样：

maci:/ san$ echo "She sells sea shells by the sea shore" | sed 's/\(sh[a-z]*\)/../gI'
.. sells sea .. by the sea ..

（“I”需要更新版本的 sed 来忽略大小写）
这很简单：以“sh”开头的单词被双点 (..) 替换，但我该如何让它变成这样：... sells sea ...... by the sea .....

任何想法？干杯!

score 7 · Accepted Answer

我的怀疑是你不能在标准中做到这一点sed，但你可以用 Perl 或其他更强大的正则表达式处理来做到这一点。

$ echo "She sells sea shells by the sea shore" |
> perl -pe 's/(sh[a-z]*)/"." x length($1)/gei'
... sells sea ...... by the sea .....
$

e修饰符表示替换模式是可执行的 Perl 脚本；在这种情况下，它重复字符.的次数与匹配模式中的字符数一样多。修饰符在g整个线上重复；i修饰符用于不区分大小写的匹配。Perl的-p选项在选项指定的脚本中处理后打印每一行-e- 替换命令。

score 5 · Accepted Answer

这个 awk-oneliner 可以为您完成这项工作吗？

awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1' file

用你的数据测试：

kent$  echo "She sells sea shells by the sea shore"|awk '{for(i=1;i<=NF;i++)if($i~/^[Ss]h/)gsub(/./,".",$i)}1'
... sells sea ...... by the sea .....

score 5 · Accepted Answer

一个老问题，但我找到了一个不错且相对较短的单行 sed 解决方案：

sed ':a;s/\([Ss]h\.*\)[^\. ]/\1./;ta;s/[Ss]h/../g'

通过在循环中一次替换一个字符来工作。

:a;开始一个循环

s/\([Ss]h\.*\)[^\. ]搜索 ansh后跟任意数量的.s（到目前为止我们已完成的工作），后跟非点或空格字符（我们将要替换的内容）

/\1./;用我们迄今为止完成的工作加上另一个..

ta;如果我们进行任何替换，则循环，否则...

s/[Ss]h/../gsh用两个s替换.s 并收工。

score 4 · Accepted Answer

$ echo "She sells sea shells by the sea shore" |
awk '{
   head = ""
   tail = $0
   while ( match(tolower(tail),/sh[a-z]*/) ) {
      dots = sprintf("%*s",RLENGTH,"")
      gsub(/ /,".",dots)
      head = head substr(tail,1,RSTART-1) dots
      tail = substr(tail,RSTART+RLENGTH)
   }
   print head tail
}'
... sells sea ...... by the sea .....

score 3 · Accepted Answer

正如其他人所指出的，sed 不太适合这项任务。这当然是可能的，这里有一个例子，它适用于带有空格分隔的单词的单行：

echo "She sells sea shells by the sea shore" |

sed 's/ /\n/g' | sed '/^[Ss]h/ s/[^[:punct:]]/./g' | sed ':a;N;$!ba;s/\n/ /g'

输出：

... sells sea ...... by the sea .....

第一个'sed'用换行符替换空格，第二个做点，第三个删除换行符，如this answer所示。

如果您有不可预测的单词分隔符和/或段落，这种方法很快就会变得难以管理。

编辑 - 多行替代

这是一种处理多行输入的方法，灵感来自Kent 的评论（GNU sed）：

echo "
She sells sea shells by the sea shore She sells sea shells by the sea shore,
She sells sea shells by the sea shore She sells sea shells by the sea shore
 She sells sea shells by the sea shore She sells sea shells by the sea shore
" |

# Add a \0 to the end of the line and surround punctuations and whitespace by \n 
sed 's/$/\x00/; s/[[:punct:][:space:]]/\n&\n/g' |

# Replace the matched word by dots
sed '/^[Ss]h.*/ s/[^\x00]/./g' | 

# Join lines that were separated by the first sed
sed ':a;/\x00/!{N;ba}; s/\n//g'

输出：

... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....,
... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....
 ... sells sea ...... by the sea ..... ... sells sea ...... by the sea .....

score 3 · Accepted Answer

这可能对您有用（GNU sed）：

sed -r ':a;/\b[Ss]h\S+/!b;s//\n&\n/;h;s/.*\n(.*)\n.*/\1/;s/././g;G;s/(.*)\n(.*)\n.*\n/\2\1/;ta' file

在本质上; sh它复制一个以or开头的单词Sh，将每个字符替换为.，然后将新字符串重新插入到原始字符串中。当搜索字符串的所有出现都用尽时，它会打印出该行。

替代：

sed -E 's/\S+/\n&/g;s#.*#echo "&"|sed "/^sh/Is/\\S/./g"#e;s/\n//g' file

regex - 使用 sed 替换等长的文本

6 回答 6

编辑 - 多行替代

Related

Reference