I am trying to parse an HTML document with awk.

The document contains several <div class="p_header_bottom"></div blocks

 <div class="p_header_bottom">
    <span class="fl_r"></span>
    287,489 people
  <div class="p_header_bottom">
    <span class="fl_r"></span>
    5 links

I am using

awk '/<div class="p_header_bottom">/,/<\/div>/'

to receive all such div's.

How I can get 287,489 number from first one?

Actually awk '/<\/span>/,/people/' doesn't work correctly.


1 回答 1


使用,并假设每个<div> </div>块中的唯一数字和逗号出现在感兴趣的数字部分

awk -v RS='<[/]?div[^>]*>' '/span/ && /people/{gsub(/[^[:digit:],]/, ""); print}' file.txt
于 2013-11-07T16:00:27.263 回答