我有一个 Apache 日志文件,格式如下:
112.135.128.20 - [13/May/2013:23:55:04 +0530] "GET /SVRClientWeb/ActionController HTTP/1.1" 302 2 "https://www.example.com/sample" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Mobile/10B329" GET /SVRClientWeb/ActionController - HTTP/1.1 www.example.com
日志文件被导入到 pandas 数据框。
df = df.rename(columns={'%>s': 'Status', '%b':'Bytes Returned',
'%h':'IP', '%l':'Username', '%r': 'Request', '%t': 'Time', '%u': 'Userid', '%{Referer}i': 'Referer', '%{User-Agent}i': 'Agent'})
我想得到一个特定的 ip 并找到该 IP 的每次命中的时间差。(例如 124.43.104.198 第一次出现在 06.05.02,然后再次出现在 06.10.03)
我有一种使用以下代码的想法,但我无法完全找到方法。帮我解决这个问题。
selected_ip = df['IP'][df['IP'] == '220.250.237.36']
df.index = pd.to_datetime(df.pop('Time'))
df['tvalue'] = df.index
df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)
Status Bytes IP Username Request Time Userid Referer Agent
0 200 974 124.43.203.106 - GET /favicon.ico HTTP/1.1 06/Jun/2013 06:03:08 -0600 - - Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
1 200 739 124.43.203.106 - GET /js/themes/dark/next.gif HTTP/1.1 06/Jun/2013 06:03:09 -0600 - http://www.gadgets.lk/full-detail-of-used-herc... Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
2 200 699 124.43.203.106 - GET /js/themes/dark/prev.gif HTTP/1.1 06/Jun/2013 06:03:09 -0600 - http://www.gadgets.lk/full-detail-of-used-herc... Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
3 200 770 112.135.56.48 - GET /images/nav-hover.jpg HTTP/1.1 06/Jun/2013 06:03:19 -0600 - http://www.gadgets.lk/used-brand-new-security-... Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...
4 200 366 74.86.158.106 - HEAD / HTTP/1.1 06/Jun/2013 06:03:29 -0600 - - Mozilla/5.0+(compatible; UptimeRobot/2.0; http...
5 200 36709 150.70.172.103 - GET /js/jquery.validate.js HTTP/1.0 06/Jun/2013 06:03:40 -0600 - - Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ...
预期输出:给定 IP:220.250.237.36
Times of Hit Differece between occurances
06.05.02
06.10.00 00.04.58
07.30.00 00.30.00