1

我有一个 Apache 日志文件,格式如下:

112.135.128.20 - [13/May/2013:23:55:04 +0530] "GET /SVRClientWeb/ActionController HTTP/1.1" 302 2 "https://www.example.com/sample" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Mobile/10B329" GET /SVRClientWeb/ActionController - HTTP/1.1 www.example.com

日志文件被导入到 pandas 数据框。

df = df.rename(columns={'%>s': 'Status', '%b':'Bytes Returned', 
                        '%h':'IP', '%l':'Username', '%r': 'Request', '%t': 'Time', '%u': 'Userid', '%{Referer}i': 'Referer', '%{User-Agent}i': 'Agent'})

我想得到一个特定的 ip 并找到该 IP 的每次命中的时间差。(例如 124.43.104.198 第一次出现在 06.05.02,然后再次出现在 06.10.03)

我有一种使用以下代码的想法,但我无法完全找到方法。帮我解决这个问题。

selected_ip = df['IP'][df['IP'] == '220.250.237.36']
df.index = pd.to_datetime(df.pop('Time'))
df['tvalue'] = df.index
df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)

   Status   Bytes   IP           Username   Request                               Time          Userid       Referer        Agent
0   200     974     124.43.203.106  -   GET /favicon.ico HTTP/1.1   06/Jun/2013 06:03:08 -0600  -   -   Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
1   200     739     124.43.203.106  -   GET /js/themes/dark/next.gif HTTP/1.1   06/Jun/2013 06:03:09 -0600  -   http://www.gadgets.lk/full-detail-of-used-herc...   Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
2   200     699     124.43.203.106  -   GET /js/themes/dark/prev.gif HTTP/1.1   06/Jun/2013 06:03:09 -0600  -   http://www.gadgets.lk/full-detail-of-used-herc...   Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
3   200     770     112.135.56.48   -   GET /images/nav-hover.jpg HTTP/1.1  06/Jun/2013 06:03:19 -0600  -   http://www.gadgets.lk/used-brand-new-security-...   Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.3...
4   200     366     74.86.158.106   -   HEAD / HTTP/1.1     06/Jun/2013 06:03:29 -0600  -   -   Mozilla/5.0+(compatible; UptimeRobot/2.0; http...
5   200     36709   150.70.172.103  -   GET /js/jquery.validate.js HTTP/1.0     06/Jun/2013 06:03:40 -0600  -   -   Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ...

预期输出:给定 IP:220.250.237.36

Times of Hit     Differece between occurances
06.05.02     
06.10.00         00.04.58
07.30.00         00.30.00
4

2 回答 2

0

您的Time列是否包含日期时间信息?如果是这样的话,

df[df.IP == '220.250.237.36']['Time'].diff()

编辑

要保留原始时间信息:

df = df[df.IP == '220.250.237.26'][['Time']]
df['diff'] = df['time'].diff()
于 2013-06-20T03:44:47.983 回答
0

以下代码有效!

    df['tvalue'] = df.index
    df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)
    df[(df.IP == '61.245.172.48')][['delta']]
于 2013-06-20T04:32:47.960 回答