0

我有一个直播连接日志文件,想知道使用了哪些客户端以及使用频率。日志文件非常大(大约 100mb),包含过去 3 年的条目。日志条目如下所示(IP 已随机化!):

<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@15:46:34> [dest: 1.187.2.99] connection closed (9 seconds) (UID: 25477)[L: 1]{Bytes: 403705}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:40:56> [dest: 1.194.2.16] connection closed (981 seconds) (UID: 25478)[L: 1]{Bytes: 15938209}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@16:41:40> [dest: 1.158.2.39] connection closed (11 seconds) (UID: 25479)[L: 1]{Bytes: 432719}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:07:48> [dest: 1.142.2.225] connection closed (979 seconds) (UID: 25480)[L: 1]{Bytes: 15919475}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:19:07> [dest: 1.232.2.215] connection closed (19 seconds) (UID: 25481)[L: 1]{Bytes: 417192}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@18:34:46> [dest: 1.187.2.99] connection closed (2 seconds) (UID: 25482)[L: 1]{Bytes: 282751}(P: 1)

我想提取每个唯一的客户端,并计算这种客户端的使用频率。对于上面的日志,结果应该如下所示:

Internet%20Explorer%207   2
WMPlayer/10.0.0.364       1
WinampMPEG/5.50           2
TapinRadio                1

起初,我只是过滤了所有无用的条目。(抱歉使用 cat。)

cat shoutcast.log | grep "starting stream" > filtered.txt

结果如下所示:

<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)

但是现在呢?我有点迷茫,如何访问{A: }括号中的信息?

4

1 回答 1

2

试试这个 awk 行:

 awk -F'{A: |}' '/starting/{a[$2]++}END{for(x in a)print x" : "a[x]}' input

用你的数据测试:

kent$  cat ff
<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@15:46:34> [dest: 1.187.2.99] connection closed (9 seconds) (UID: 25477)[L: 1]{Bytes: 403705}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:40:56> [dest: 1.194.2.16] connection closed (981 seconds) (UID: 25478)[L: 1]{Bytes: 15938209}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@16:41:40> [dest: 1.158.2.39] connection closed (11 seconds) (UID: 25479)[L: 1]{Bytes: 432719}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:07:48> [dest: 1.142.2.225] connection closed (979 seconds) (UID: 25480)[L: 1]{Bytes: 15919475}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:19:07> [dest: 1.232.2.215] connection closed (19 seconds) (UID: 25481)[L: 1]{Bytes: 417192}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@18:34:46> [dest: 1.187.2.99] connection closed (2 seconds) (UID: 25482)[L: 1]{Bytes: 282751}(P: 1)

kent$  awk -F'{A: |}' '/starting/{a[$2]++}END{for(x in a)print x" : "a[x]}' ff
WMPlayer/10.0.0.364 : 1
TapinRadio : 1
WinampMPEG/5.50 : 2
Internet%20Explorer%207 : 2
于 2013-03-23T20:13:55.240 回答