grep - IP 地址的 Grepping 日志

Question

我很不擅长使用“基本？” unix 命令，这个问题让我的知识更加考验。我想做的是从日志中grep所有IP地址（例如来自apache的access.log）并计算它们发生的频率。我可以用一个命令来做到这一点，还是需要为此编写一个脚本？

score 17 · Accepted Answer

你至少需要一个短管道。

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c

它将打印每个 IP（尽管仅适用于 ipv4），并以计数为前缀进行排序。

我用 apache2 的 access.log 测试了它（虽然它是可配置的，所以你需要检查），它对我有用。它假定 IP 地址是每一行的第一件事。

sed 收集 IP 地址（实际上它查找 4 组数字，中间有句点），并用它替换整行。-e t如果它设法进行替换，则继续到下一行，-e d删除该行（如果上面没有 IP 地址）。sort排序.. :) 并uniq -c计算连续相同行的实例（因为我们已经对它们进行了排序，所以对应于总计数）。

score 7 · Accepted Answer

这里提供的答案都不适合我，所以这是一个可行的答案：

cat yourlogs.txt | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" | sort | uniq -c | sort

它使用 grep 来隔离所有 ip。然后对它们进行排序、计数，并再次对结果进行排序。

score 2 · Accepted Answer

您可以执行以下操作（其中 datafile 是日志文件的名称）

egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' datafile | sort | uniq -c

编辑：错过了关于计数地址的部分，现在添加

score 0 · Accepted Answer

0

egrep '[[:digit:]]{1,3}(.[[:digit:]]{1,3}){3}' |awk '{print $1}'|sort|uniq -c

于 2013-11-04T05:53:52.487 回答

score 0 · Accepted Answer

下面是我几年前写的一个脚本。它从 apache 访问日志中提取地址。我刚刚尝试运行 Ubuntu 11.10 (oneiric) 3.0.0-32-generic #51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux 它工作正常。使用 Gvim 或 Vim 读取生成的文件，该文件将被称为 unique_visits，它将在一列中列出唯一 ips。关键在于与 grep 一起使用的行。这些表达式用于提取 IP 地址编号。仅限 IPV4。您可能需要检查并更新浏览器版本号。我为 Slackware 系统编写的另一个类似脚本在这里： http ://www.perpetualpc.net/srtd_bkmrk.html

#!/bin/sh
#eliminate search engine referals and zombie hunters. combined_log is the original file
egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search
#now sort them to eliminate duplicates and put them in order
sort -un search > search_sort
#do the same with original file
sort -un combined_log > combined_log_sort
#now get all the ip addresses. only the numbers
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip
sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip
#get rid of the extra column
grep -o '^\|[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' final_result_ip > bookmarked_ip
#remove stuff like browser versions and system versions
egrep -v '(4.4.2.0)|(1.6.3.1)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)' bookmarked_ip > unique_visits

exit 0

score 0 · Accepted Answer

由于在一个 IP 地址中，3-Digits-Then-A-Dot 重复了 3 次，所以我们可以这样写：

cat filename | egrep -o "([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}"
                                      ^^^     ^       ^~~~~~~~   
                         Up_to_3_digits.     Repeat_thrice.   Last_section.

使用 bash 变量甚至更短：

PAT=[[:digit:]]{1,3}
cat filename | egrep -o "($PAT\.){3}$PAT"

要仅打印文件中唯一的 IP 地址，请使用sort --uniq.

score -1 · Accepted Answer

使用 sed：

$ sed 's/.*\(<regex_for_ip_address>\).*/\1/' <filename> | sort | uniq -c

您可以在 Inernet 上搜索并找到可用于 ip 地址的正则表达式，并将其替换为<regex_for_ip_address>. 例如，来自对 stackoverflow 上相关问题的回答

score -1 · Accepted Answer

cat access.log |egrep -o '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' |uniq -c|sort

grep - IP 地址的 Grepping 日志

8 回答 8

Related

Reference