python - 有什么办法可以让这个功能看起来更好看？

Question

我需要一个从 Apache 日志文件中提取 url 的逻辑：现在我这样做了：

apache_log = {'@source': 'file://xxxxxxxxxxxxxxx//var/log/apache2/access.log', '@source_host': 'xxxxxxxxxxxxxxxxxxx', '@message': 'xxxxxxxxxxxxxxx xxxxxxxxxx - - [02/Aug/2013:12:38:37 +0000] "POST /user/12345/product/2 HTTP/1.1" 404 513 "-" "PycURL/7.26.0"', '@tags': [], '@fields': {}, '@timestamp': '2013-08-02T12:38:38.181000Z', '@source_path': '//var/log/apache2/access.log', '@type': 'Apache-access'}
data = apache_log['@message'].split()
if data.index('"POST') and data[data.index('"POST')+2].startswith('HTTP'):
     print data[data.index('"POST')+1]

它返回给我：

/user/12345/product/2

基本上结果是正确的，但我这样做的方式我不太喜欢。

有人可以建议从 apache 日志文件中提取此路径的更好（更多 Pythonic）方法。

score 5 · Accepted Answer

正则表达式会更好：

import re

post_path = re.compile(r'"POST (/\S+) HTTP')

match = post_path.search(apache_log['@message'])
if match:
    print match.group(1)

演示：

>>> import re
>>> apache_log = {'@source': 'file://xxxxxxxxxxxxxxx//var/log/apache2/access.log', '@source_host': 'xxxxxxxxxxxxxxxxxxx', '@message': 'xxxxxxxxxxxxxxx xxxxxxxxxx - - [02/Aug/2013:12:38:37 +0000] "POST /user/12345/product/2 HTTP/1.1" 404 513 "-" "PycURL/7.26.0"', '@tags': [], '@fields': {}, '@timestamp': '2013-08-02T12:38:38.181000Z', '@source_path': '//var/log/apache2/access.log', '@type': 'Apache-access'}
>>> post_path = re.compile(r'"POST (/\S+) HTTP')
>>> match = post_path.search(apache_log['@message'])
>>> if match:
...     print match.group(1)
... 
/user/12345/product/2

python - 有什么办法可以让这个功能看起来更好看？

1 回答 1

Related

Reference