0

I'm trying to extract some information from a very huge document. The file follows this pattern:

'>Title 1'

0 200, >name [numbers&letters]

1 200, >name [numbers&letters] 

2 200, >name [numbers&letters]

'>Title 2'

0 200, >name [numbers&letters]

1 200, >name [numbers&letters] 

...

'>Title 600.000'

For each group of lines between Title, I need to print 3 tab-separated columns

  • the number of lines between each Title
  • the name
  • the number on the 2nd column of the first line in the group (first lines always starts with 0) – in the example the number is 200

I'm trying with Bash and awk/sed, but I'm not able to define a kind of loop for this task. Any ideas?

4

2 回答 2

2

未经测试但应该接近:

awk -F'[ ,>]+' '
/^.>/ {
    if (count != "") { printf "%s %d %d\n", name, number, count }
    count = 0
    name = number = ""
    next
} 
NF {
    if (++count == 1) { number = $2; name = $3 }
}
END {
    if (count != "") { printf "%s %d %d\n", name, number, count }
}
' file
于 2013-05-22T12:46:35.750 回答
1
awk '/^.>/{t=$0;next} NF{a[t]++} $1=="0"{b[t]=$2} END{for (i in a) print i,a[i],b[i]}' file
于 2013-05-22T11:47:50.837 回答