1

我有一个这样的 AWK 脚本,我将在一个文件上运行它:

cat input.txt | awk 'gsub(/[^ ]*(fish|shark|whale)[^ ]*/,"(&)")' >> output.txt

这会为所有包含单词“fish”、“shark”或“whale”的行添加括号,例如:

The whale asked the shark to swim elsewhere.
The fish were unhappy.

通过脚本运行后,文件变为:

The (whale) asked the (shark) to swim elsewhere.
The (fish) were unhappy.

该文件标有 HTML 标签,我需要让替换只发生在<b></b>标签之间。

The whale asked <b>the shark to swim</b> elsewhere.
<b>The fish were</b> unhappy.

这变成:

The whale asked <b> the (shark) to swim </b> elsewhere.
<b> The (fish) were </b> unhappy.
  • 匹配的粗体标签永远不会放在不同的行上。开始<b>标签总是与结束标签出现在同一行</b>

如何将awk的搜索限制为仅搜索和修改在<b></b>标签之间找到的文本?

4

2 回答 2

1

这是一种使用的技术awk

awk '/<b>/{f=1}/<\/b>/{f=0}f{gsub(/fish|shark|whale/,"(&)")}1' RS=' ' ORS=' ' file
The whale asked <b>the (shark) to swim</b> elsewhere.
<b>The (fish) were</b> unhappy.
于 2013-04-21T10:09:23.877 回答
1

只要 HTML 标记不比这差,并且<b> ... </b>span 不包含任何其他 HTML 标记,那么在 Perl 中就相对容易了:

$ cat data
The whale asked <b>the shark to swim</b> elsewhere.
<b>The fish were</b> unhappy.
The <b> dogfish and the sharkfin soup</b> were unscathed.
$ perl -pe 's/(<b>[^<]*)\b(fish|shark|whale)\b([^<]*<\/b>)/\1(\2)\3/g'  data | so
The whale asked <b>the (shark) to swim</b> elsewhere.
<b>The (fish) were</b> unhappy.
The <b> dogfish and the sharkfin soup</b> were unscathed.
$ 

我尝试将其调整为awk(和gawk),但没有成功;匹配部分有效,但替换表达式没有。与 Perl 不同,阅读手册后,您无法识别括号中的单独匹配子表达式。

于 2013-04-21T02:31:39.633 回答