0

我有大量的日志,例如:

Apr 15 06:24:52  11.250.30.X:53516 [15/Apr/2012:06:24:51.504] userA 200 "GET HTTP/1.1"
Apr 15 06:24:52  11.250.30.X:53516 [15/Apr/2012:06:24:51.504] userA 200 "GET HTTP/1.1"
Apr 15 06:24:52  11.250.30.X:53516 [15/Apr/2012:06:24:51.504] userB 200 "GET HTTP/1.1"
Apr 15 06:24:52  11.250.30.X:53516 [15/Apr/2012:06:24:51.504] userC 200 "GET HTTP/1.1"
Apr 15 06:24:52  11.250.30.X:53516 [15/Apr/2012:06:24:51.504] userC 200 "GET HTTP/1.1"
Apr 15 06:24:52  11.250.30.X:53516 [15/Apr/2012:06:24:51.504] userD 200 "GET HTTP/1.1"

这是 Bash shell 中解析日志的最快方法(每个用户的所有请求源 IP):

userA:
XXX.XXX.XXX.XXX(client's source IP, remove port number and uniq same IPs.)
XXX.XXX.XXX.XXX
...
userB:
XXX.XXX.XXX.XXX
XXX.XXX.XXX.XXX
XXX.XXX.XXX.XXX
...
userC:
...
4

3 回答 3

3

使用 awk:

awk '
{ a[$6] = $4 "\n" a[$6] }                                     
END {
    for (u in a) print u ":\n" a[u]
}' FILE

要删除端口和 uniq 主机试试这个(我没有很好地测试这个):

awk '
{
  sub(":.*$", "", $4)
  if (!a[$6,$4]) a[$6,$4]++
}
END {
    for (u in a ) {
      split(u, b, SUBSEP)
      nu[b[1]] = b[2] "\n" nu[b[1]]
   }
   for (u in nu) print u ":\n" nu[u]
}' FILE
于 2012-04-15T06:51:49.223 回答
0

这可能有用;它以适合进一步自动处理的形式获取所需的数据(按用户排序的用户/IP 对列表):

awk '{split($4,a,":"); print $6, a[1]; }' | sort -u
于 2012-04-15T11:34:46.393 回答
0

一个 Bash 解决方案:

declare -A ips=()
while read x x x ip x user rest; do
  ips[$user ${ip%:*}\\n]=1                  # hash user+ip+newline
done < "$infile"

userold=''
while read user ip; do                      # split user, ip
  [ "$userold" != "$user" ] && echo "$user" && userold="$user"
  echo  "$ip"
done < <( echo -e "${!ips[*]}" | sort )     # feed sorted keys

输入:

Apr 15 06:24:52  11.250.30.1:53516 [15/Apr/2012:06:24:51.504] userA 200 "GET HTTP/1.1"
Apr 15 06:24:54  11.250.30.2:53516 [15/Apr/2012:06:24:51.504] userA 200 "GET HTTP/1.1"
Apr 15 06:24:55  11.250.30.3:53516 [15/Apr/2012:06:24:51.504] userB 200 "GET HTTP/1.1"
Apr 15 06:24:51  11.250.30.4:53516 [15/Apr /2012:06:24:51.504] userC 200 "GET HTTP/1.1"
Apr 15 06:24:52  11.250.30.4:53516 [15/Apr/2012:06:24:51.504] userC 200 "GET HTTP/1.1"
Apr 15 06:24:58  11.250.30.5:53516 [15/Apr/2012:06:24:51.504] userD 200 "GET HTTP/1.1"

输出排序:

userA
11.250.30.1
11.250.30.2
userB
11.250.30.3
userC
11.250.30.4
userD
11.250.30.5
于 2012-04-15T17:29:43.700 回答