因此,我正在尝试制作一个 awk 脚本,该脚本按最高三个的顺序确定最多的命中。我这样做是基于一个看起来像的 apache web 日志
192.168.198.92 - - [22/Dec/2002:23:08:37 -0400] "GET / HTTP/1.1" 200 6394 www.yahoo.com "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-"
192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET /images/logo.gif HTTP/1.1" 200 807 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE 6...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /news/sports.html HTTP/1.1" 200 3500 www.yahoo.com "http://www.some.com/" "Mozilla/4.0 (compatible; MSIE ...)" "-"
192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP/1.1" 404 1997 www.yahoo.com "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3)..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:15 -0400] "GET /style.css HTTP/1.1" 200 4138 www.yahoo.com "http://www.yahoo.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET /js/ads.js HTTP/1.1" 200 10229 www.yahoo.com "http://www.search.com/index.html" "Mozilla/5.0 (Windows..." "-"
192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET /search.php HTTP/1.1" 400 1997 www.yahoo.com "-" "Mozilla/4.0 JJohnJoJJJJJoJJoJJJJJoJJohJJJJJJJJJJJJohnJohJoJoJJJoJJ
为此,我这样做:
$1 ~ /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ {
hitCounter[$1]++
notIndexed=1
for(i in ips) {
if (i==$1) { notIndexed=0 }
}
if(notIndexed==1) {
ips[indexx]=$1
indexx++
}
}
此行检测 IP,然后在由 IP 索引的“hitCounter”数组中增加它的命中计数。之后,我检查 ips 列表“ips”,看看命中的 IP 是否已经在其中。如果不是,则将 IP 添加到“ips”数组中,并且索引计数增加一。理论上,通过这样做,“ips”中的每个索引都应该与“hitCounter”中的索引相关。最后我有...
END {
indexxx=0
for (i in hitCounter) {
if (i>hitCounter[firstIP])
firstIP=ips[indexxx]
else if (i>hitCounter[secondIP])
secondIP=ips[indexxx]
else
thirdIP=ips[indexxx]
indexxx++
}
}
在这里,我检查了“hitCounter”中的 IP 命中计数,将它们与三个高命中变量中的命中进行比较,如果 IP 命中大于三个高命中变量内容之一,我将其设置为当前IP。
这似乎对我有用,我应该得到“192.168.72.177 192.168.198.92”作为输出,但我得到的是“192.168.198.92 192.168.198.92”。
为什么?
编辑:对不起,这就是我打印放置在“hitCounter”foreach循环之后的最终结果的方式......
print "The most hits were from "firstIP" "secondIP" "thirdIP