awk - awk 并多次提取特定字段

Question

我有很多带有变量的文件，比如

{$var1} some text {$var2} some other text

我想把它们交给 awk 以便 awk 提取它们并给出如下结果：

file_name.htm - 8 : {$title}
file_name.htm - 10 : {$css_style}
file_name.htm - 33 : {$img_carte_image_02_over}

这个 awk 脚本小菜一碟：

#!/usr/bin/gawk -f
BEGIN { }
match($0, /({.*\$.+})/, tab) {
  for (x=1; tab[x]; x++) {
    print FILENAME" - "FNR" : "substr($0, tab[x, "start"], tab[x, "length"])
  }
}
END { }

我这样称呼它：

find website/ | grep -E '(html|htm)$' | xargs ./myh.sh | more

除非多个变量在同一行，否则一切正常。在这种情况下，我得到：

file_name.htm - 59 : {$var1}<br/>{$var2}

而我想要：

file_name.htm - 59 : {$var1}
file_name.htm - 59 : {$var2}

知道我可以/应该怎么做吗？当然，如果您有其他解决方案（使用 sed 或其他），那对我来说没问题！

非常感谢！

score 2 · Accepted Answer

试试这个：

awk '{
    line=$0; 
    while (match(line,/({[^$]*\$[^}]+})/)){
        print FILENAME,"-",FNR,":",substr(line,RSTART,RLENGTH);
        line=substr(line,RSTART+RLENGTH+1)
    }
}'

The cycle ends when match() returns 0, that is when line doesn't contain any other "{foo$bar}" strings; I used substr() to remove the part of the line which has been already scanned for matches.

score 0 · Accepted Answer

尝试在比赛中使用非贪婪的正则表达式（http://www.exampledepot.com/egs/java.util.regex/Greedy.html）。可能不会工作，但只是一个想法。

awk - awk 并多次提取特定字段

2 回答 2

Related

Reference