regex - Eliminate lines not matching multiline pattern

Question

I'm searching through a logfile, trying to determine the total time a user was logged on. I've already eliminated all lines not related to logins and logoffs. However, for some reason we have login lines that don't have corresponding logout lines, so I'd like to eliminate them. For instance:

2013-04-07 08:44:01 [INFO] User logged in
2013-04-07 08:54:55 [INFO] User logged in
2013-04-07 08:57:12 [INFO] User logged in
2013-04-07 08:59:45 [INFO] User logged in
2013-04-07 09:01:28 [INFO] User logged in
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection

And I want just

2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection

score 1 · Accepted Answer

this awk one-liner could solve the problem:(at least for your example. the real file I cannot see)

awk -F\[ '{a[$2]=$0;}END{for(x in a)print a[x]}' file

test with your data:

kent$  echo "2013-04-07 08:44:01 [INFO] User logged in
2013-04-07 08:54:55 [INFO] User logged in
2013-04-07 08:57:12 [INFO] User logged in
2013-04-07 08:59:45 [INFO] User logged in
2013-04-07 09:01:28 [INFO] User logged in
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection"|awk -F\[ '{a[$2]=$0;}END{for(x in a)print a[x]}'                                                                           
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection

for same login, only the last one would be printed out.

EDIT

I thought your real file may be in this case:

you could have multiple login-lost connection blocks, like:

kent$  cat file
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection
2013-04-08 09:11:00 [INFO] User logged in
2013-04-08 09:12:56 [INFO] User logged in
2013-04-08 09:15:43 [INFO] User lost connection

then this line works for you:

 awk '/lost/{print a;print;next;}{a=$0}' file

output is:

2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection
2013-04-08 09:12:56 [INFO] User logged in
2013-04-08 09:15:43 [INFO] User lost connection

score 1 · Accepted Answer

Assuming that there will never be multiple User lost connection lines in a row, the following should work:

sed '/User logged in/{h;d};H;x' file

Or if you are on a system that doesn't support ; as a command separator:

sed -e '/User logged in/{h
d
}' -e 'H' -e 'x' file

score 1 · Accepted Answer

I can show an awk solution. If a line contains the "logged in" string save the line. If the line does not contain the "logged in" string print the last stored line and print the present line. Can be a problem if there could be two "lost connection" lines following each other. Awk can be a good choice to filter out the other lines as well.

#!/bin/bash

awk '!/logged in/ {print x"\n"$0} {x = $0}' <<EOT
2013-04-07 08:44:01 [INFO] User logged in
2013-04-07 08:54:55 [INFO] User logged in
2013-04-07 08:57:12 [INFO] User logged in
2013-04-07 08:59:45 [INFO] User logged in
2013-04-07 09:01:28 [INFO] User logged in
2013-04-07 09:11:00 [INFO] User logged in
2013-04-07 09:12:56 [INFO] User logged in
2013-04-07 09:15:43 [INFO] User lost connection
EOT

score 0 · Accepted Answer

0

This might work for you (GNU sed):

sed -r '$!N;/(User logged in)\n.*\1/D' file

于 2013-09-17T15:24:49.287 回答

regex - Eliminate lines not matching multiline pattern

4 回答 4

Related

Reference