regex - awk: Counting errors in tomcat logs

Question

My Tomcat logs are build in this format:

[<DATE>] [<COMPONENT>] ERROR_TYPE <ERROR_NAME> - <Rest of line>

Where ERROR_TYPE is a log4j value like DEBUG or ERROR.

e.g.,

[18/Jul/2012:08:53:39 +0000] [component1] ERROR ConnectionTimeOut - ...
[18/Jul/2012:09:54:32 +0000] [component2] DEBUG IPNotFound - ...
[18/Jul/2012:09:54:32 +0000] [component1] TRACE Connected - ...
[18/Jul/2012:08:53:39 +0000] [component1] ERROR ConnectionTimeOut - ...

I would like to create a maps from the tuple (ERROR_TYPE, ERROR_NAME) to the number of occurrences, e.g.

ERROR ConnectionTimeOut       2
DEBUG IPNotFound              1
TRACE Connected               1

How do I match something like:

_anything_ (ERROR|DEBUG|TRACE|WARN|FATAL_spaces_ _another_word_)_anything_

in AWK, and return only the part in parentheses?

score 3 · Accepted Answer

awk '/ERROR|DEBUG|TRACE|WARN|FATAL/ {count[$4,$5]++} END {for (i in count) {split(i, a, SUBSEP); print a[1], a[2], count[i]}}' inputfile

Lines are selected which contain the error types. A count array element is incremented for the type and name taken together as the index. The comma represents the contents of the SUBSEP variable which defaults to \034. In the END block, iterate over the count array, splitting the indices using the SUBSEP variable. Print the type, name and count.

Edit:

This uses a regex to handle unstructured log entries:

awk 'match($0, /(ERROR|DEBUG|TRACE|WARN|FATAL) +[^ ]+/) {s = substr($0, RSTART, RLENGTH); split(s, a); count[a[1],a[2]]++} END {for (i in count) {split(i, a, SUBSEP); print a[1], a[2], count[i]}}' inputfile

regex - awk: Counting errors in tomcat logs

1 回答 1

Related

Reference