sed - 使用 sed 删除一行中的重复单词

Question

纯学术，但让我很沮丧。

我想更正这段文字：

there there are are multiple lexical errors in this line line

使用 sed。我已经做到了这一点：

sed 's/\([a-z][a-z]*[ ,\n][ ,\n]*\)\1/\1/g' < file.text

它纠正了除了最后加倍的单词之外的所有内容！

there are multiple lexical errors in this line line

sed 大师能否解释一下为什么上面没有处理最后的单词？

score 10 · Accepted Answer

这是因为在最后一种情况下 ( line)，您的正则表达式内存 1 将包含line（行后跟一个空格），并且您正在搜索它的重复。line由于最后一个匹配失败后没有空间。

要解决此问题，请在结束词后添加一个空格line。

或者，您可以将正则表达式更改为：

sed -e 's/\b\([a-z]\+\)[ ,\n]\1/\1/g'

1 回答 1