apache - Grep specific domain and all subdomains from access.log

Question

I'm trying to grep a specific line with domain from Apache2 access.log. In my access.log I have all my virtual hosts and different domains.

cat/var/log/access.log:

www.something-else-domain.si:80 193.77.xxx. xxx - - [06/Nov/2013:12:21:45 +0100] "GET /path/to/dir/image.jpg HTTP/1.1" 304 - "www.something-else-domain.si/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"

www.domain.si:80 193.77.xxx. xxx - - [06/Nov/2013:12:21:45 +0100] "GET /path/to/dir/image. jpg HTTP/1.1" 304 - "www.domain.si/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"

domain.si:80 193.77.xxx. xxx - - [06/Nov/2013:12:21:45 +0100] "GET /path/to/dir/image. jpg HTTP/1.1" 304 - "www.domain.si/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"

I would want to grep only the domain.si and www.domain.si and whatever.domain.si and not something-else-domain.si. How could I do that? Thanks for help.

score 2 · Accepted Answer

egrep '^([^ ]*\.)?domain\.si' /var/log/access.log

把这个分开：

^是行的开头。
(xxx)?是“匹配xxx或不匹配”；在这种情况下，匹配：
- 什么都没有，这是裸域名的情况（domain.si）
- [^ ]*\., 任何不是空格的字符串，后跟一个点。这匹配可选的www.或whatever.部分。
domain\.si简单地匹配domain.si零件。

与的锚定^以及“无空格”位确保您仅匹配行开头的内容（而不是请求，如GET /domain.si）。

score 0 · Accepted Answer

一个gnu awk解决方案

awk  '/www.domain$|domanin$/ {print $NF RS}' RS=".si"
www.domain.si
"www.domain.si
"www.domain.si

你的例子有问题。空间不允许url

apache - Grep specific domain and all subdomains from access.log

2 回答 2

Related

Reference