linux - sed 的奇怪输出

Question

我有一些 html 文件，并且只想提取包含这些标签的行：

head
p

我用 sed 来提取这部分文件，如下：

grep "<head>" myfile.html | sed -e 's%\(head\)\(.*\)\(/head\)%title\2\/title%'

grep "<p>" myfile.html | sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2\\%'

一切都很好，但我在每行的末尾都有“\”字符。我怎样才能克服这个问题？

score 2 · Accepted Answer

在此命令中，您通过包含双反斜杠来告诉它添加反斜杠：

sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2\\%'

尝试删除反斜杠：

sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2%'

此外，您不需要grep：

sed -ne '/<p>/{s%\(<p>\)\(.*\)\(</p\)\(>\)%\2%;p}'

score 1 · Accepted Answer

不要在替换字符串的末尾使用 \：

grep "<p>" myfile.html | sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2%'

2 回答 2