我正在使用 python请求。我需要调试一些OAuth
活动,为此我希望它记录所有正在执行的请求。我可以通过 获取此信息ngrep
,但不幸的是,无法 grep https 连接(这是 所需的OAuth
)
如何激活Requests
正在访问的所有 URL(+ 参数)的日志记录?
我正在使用 python请求。我需要调试一些OAuth
活动,为此我希望它记录所有正在执行的请求。我可以通过 获取此信息ngrep
,但不幸的是,无法 grep https 连接(这是 所需的OAuth
)
如何激活Requests
正在访问的所有 URL(+ 参数)的日志记录?
您需要在httplib
级别 ( requests
→ urllib3
→ httplib
) 启用调试。
这里有一些功能可以切换 (..._on()
和..._off()
) 或暂时打开它:
import logging
import contextlib
try:
from http.client import HTTPConnection # py3
except ImportError:
from httplib import HTTPConnection # py2
def debug_requests_on():
'''Switches on logging of the requests module.'''
HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
def debug_requests_off():
'''Switches off logging of the requests module, might be some side-effects'''
HTTPConnection.debuglevel = 0
root_logger = logging.getLogger()
root_logger.setLevel(logging.WARNING)
root_logger.handlers = []
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.WARNING)
requests_log.propagate = False
@contextlib.contextmanager
def debug_requests():
'''Use with 'with'!'''
debug_requests_on()
yield
debug_requests_off()
演示使用:
>>> requests.get('http://httpbin.org/')
<Response [200]>
>>> debug_requests_on()
>>> requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 12150
send: 'GET / HTTP/1.1\r\nHost: httpbin.org\r\nConnection: keep-alive\r\nAccept-
Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.11.1\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
...
<Response [200]>
>>> debug_requests_off()
>>> requests.get('http://httpbin.org/')
<Response [200]>
>>> with debug_requests():
... requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
...
<Response [200]>
您将看到 REQUEST,包括 HEADERS 和 DATA,以及 RESPONSE 带有 HEADERS 但没有 DATA。唯一缺少的是未记录的 response.body。
底层库使用模块urllib3
记录所有新连接和 URL ,但不记录正文。对于请求,这应该足够了:logging
POST
GET
import logging
logging.basicConfig(level=logging.DEBUG)
它为您提供了最详细的日志记录选项;有关如何配置日志记录级别和目标的更多详细信息,请参阅日志记录 HOWTO 。
简短演示:
>>> import requests
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366
根据 urllib3 的确切版本,将记录以下消息:
INFO
: 重定向WARN
:连接池已满(如果发生这种情况经常增加连接池大小)WARN
: 解析标头失败(响应标头格式无效)WARN
: 重试连接WARN
: 证书与预期的主机名不匹配WARN
: 在处理分块响应时,接收到包含 Content-Length 和 Transfer-Encoding 的响应DEBUG
: 新连接(HTTP 或 HTTPS)DEBUG
: 断开连接DEBUG
: 连接详情:方法、路径、HTTP 版本、状态码和响应长度DEBUG
:重试计数递增这不包括标题或正文。urllib3
使用http.client.HTTPConnection
该类来完成繁重的工作,但该类不支持日志记录,通常只能将其配置为打印到标准输出。print
但是,您可以通过在该模块中引入替代名称来操纵它以将所有调试信息发送到日志记录:
import logging
import http.client
httpclient_logger = logging.getLogger("http.client")
def httpclient_logging_patch(level=logging.DEBUG):
"""Enable HTTPConnection debug logging to the logging framework"""
def httpclient_log(*args):
httpclient_logger.log(level, " ".join(args))
# mask the print() built-in in the http.client module to use
# logging instead
http.client.print = httpclient_log
# enable debugging
http.client.HTTPConnection.debuglevel = 1
调用httpclient_logging_patch()
会导致http.client
连接将所有调试信息输出到标准记录器,因此被以下内容拾取logging.basicConfig()
:
>>> httpclient_logging_patch()
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:http.client:send: b'GET /get?foo=bar&baz=python HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
DEBUG:http.client:reply: 'HTTP/1.1 200 OK\r\n'
DEBUG:http.client:header: Date: Tue, 04 Feb 2020 13:36:53 GMT
DEBUG:http.client:header: Content-Type: application/json
DEBUG:http.client:header: Content-Length: 366
DEBUG:http.client:header: Connection: keep-alive
DEBUG:http.client:header: Server: gunicorn/19.9.0
DEBUG:http.client:header: Access-Control-Allow-Origin: *
DEBUG:http.client:header: Access-Control-Allow-Credentials: true
DEBUG:urllib3.connectionpool:http://httpbin.org:80 "GET /get?foo=bar&baz=python HTTP/1.1" 200 366
对于那些使用 python 3+
import requests
import logging
import http.client
http.client.HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
当试图让 Python 日志系统 ( import logging
) 发出低级调试日志消息时,我惊讶地发现:
requests --> urllib3 --> http.client.HTTPConnection
仅urllib3
实际使用 Pythonlogging
系统:
requests
不http.client.HTTPConnection
不urllib3
是的当然,您可以通过以下设置提取调试消息HTTPConnection
:
HTTPConnection.debuglevel = 1
但这些输出只是通过print
语句发出的。为了证明这一点,只需 grep Python 3.7client.py
源代码并自己查看打印语句(感谢@Yohann):
curl https://raw.githubusercontent.com/python/cpython/3.7/Lib/http/client.py |grep -A1 debuglevel`
据推测,以某种方式重定向标准输出可能会将标准输出插入日志系统并可能捕获到例如日志文件。
urllib3
' 记录器而不是 ' requests.packages.urllib3
'要通过 Python 3 系统捕获urllib3
调试信息logging
,这与互联网上的许多建议相反,正如@MikeSmith 指出的那样,你不会有太多的运气拦截:
log = logging.getLogger('requests.packages.urllib3')
相反,您需要:
log = logging.getLogger('urllib3')
urllib3
到日志文件这是一些urllib3
使用 Pythonlogging
系统将工作记录到日志文件的代码:
import requests
import logging
from http.client import HTTPConnection # py3
# log = logging.getLogger('requests.packages.urllib3') # useless
log = logging.getLogger('urllib3') # works
log.setLevel(logging.DEBUG) # needed
fh = logging.FileHandler("requests.log")
log.addHandler(fh)
requests.get('http://httpbin.org/')
结果:
Starting new HTTP connection (1): httpbin.org:80
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168
HTTPConnection.debuglevel
print() 语句如果你设置HTTPConnection.debuglevel = 1
from http.client import HTTPConnection # py3
HTTPConnection.debuglevel = 1
requests.get('http://httpbin.org/')
您将获得额外多汁的低级信息的打印语句输出:
send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-
requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin
header: Content-Encoding header: Content-Type header: Date header: ...
请记住,此输出使用print
而不是 Pythonlogging
系统,因此无法使用传统的logging
流或文件处理程序捕获(尽管可以通过重定向 stdout 将输出捕获到文件)。
为了最大化所有可能的日志记录,您必须满足于控制台/标准输出输出:
import requests
import logging
from http.client import HTTPConnection # py3
log = logging.getLogger('urllib3')
log.setLevel(logging.DEBUG)
# logging from urllib3 to console
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
log.addHandler(ch)
# print statements from `http.client.HTTPConnection` to console/stdout
HTTPConnection.debuglevel = 1
requests.get('http://httpbin.org/')
提供全方位的输出:
Starting new HTTP connection (1): httpbin.org:80
send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin
header: Content-Encoding header: ...
拥有一个用于网络协议调试的脚本甚至是应用程序的子系统,需要查看究竟是什么请求-响应对,包括有效的 URL、标头、有效负载和状态。而且在各处检测单个请求通常是不切实际的。同时有性能考虑建议使用 single (或很少 specialised) requests.Session
,因此以下假设遵循该建议。
requests
支持所谓的事件挂钩(从 2.23 开始,实际上只有response
挂钩)。它基本上是一个事件侦听器,并且在从requests.request
. 此时请求和响应都已完全定义,因此可以记录。
import logging
import requests
logger = logging.getLogger('httplogger')
def logRoundtrip(response, *args, **kwargs):
extra = {'req': response.request, 'res': response}
logger.debug('HTTP roundtrip', extra=extra)
session = requests.Session()
session.hooks['response'].append(logRoundtrip)
这基本上就是如何记录会话的所有 HTTP 往返行程。
为了使上面的日志记录有用,可以有专门的日志格式化程序来理解req
和res
附加日志记录。它看起来像这样:
import textwrap
class HttpFormatter(logging.Formatter):
def _formatHeaders(self, d):
return '\n'.join(f'{k}: {v}' for k, v in d.items())
def formatMessage(self, record):
result = super().formatMessage(record)
if record.name == 'httplogger':
result += textwrap.dedent('''
---------------- request ----------------
{req.method} {req.url}
{reqhdrs}
{req.body}
---------------- response ----------------
{res.status_code} {res.reason} {res.url}
{reshdrs}
{res.text}
''').format(
req=record.req,
res=record.res,
reqhdrs=self._formatHeaders(record.req.headers),
reshdrs=self._formatHeaders(record.res.headers),
)
return result
formatter = HttpFormatter('{asctime} {levelname} {name} {message}', style='{')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logging.basicConfig(level=logging.DEBUG, handlers=[handler])
现在,如果您使用 做一些请求session
,例如:
session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')
的输出stderr
将如下所示。
2020-05-14 22:10:13,224 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): httpbin.org:443
2020-05-14 22:10:13,695 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
2020-05-14 22:10:13,698 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/user-agent
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
None
---------------- response ----------------
200 OK https://httpbin.org/user-agent
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: application/json
Content-Length: 45
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
{
"user-agent": "python-requests/2.23.0"
}
2020-05-14 22:10:13,814 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
2020-05-14 22:10:13,818 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/status/200
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
None
---------------- response ----------------
200 OK https://httpbin.org/status/200
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
当您有很多查询时,拥有一个简单的 UI 和过滤记录的方法会派上用场。我将展示为此使用Chronologer(我是其作者)。
首先,已经重写了钩子以生成logging
可以在通过线路发送时序列化的记录。它看起来像这样:
def logRoundtrip(response, *args, **kwargs):
extra = {
'req': {
'method': response.request.method,
'url': response.request.url,
'headers': response.request.headers,
'body': response.request.body,
},
'res': {
'code': response.status_code,
'reason': response.reason,
'url': response.url,
'headers': response.headers,
'body': response.text
},
}
logger.debug('HTTP roundtrip', extra=extra)
session = requests.Session()
session.hooks['response'].append(logRoundtrip)
其次,日志配置必须适应使用logging.handlers.HTTPHandler
(Chronologer 理解)。
import logging.handlers
chrono = logging.handlers.HTTPHandler(
'localhost:8080', '/api/v1/record', 'POST', credentials=('logger', ''))
handlers = [logging.StreamHandler(), chrono]
logging.basicConfig(level=logging.DEBUG, handlers=handlers)
最后,运行 Chronologer 实例。例如使用 Docker:
docker run --rm -it -p 8080:8080 -v /tmp/db \
-e CHRONOLOGER_STORAGE_DSN=sqlite:////tmp/db/chrono.sqlite \
-e CHRONOLOGER_SECRET=example \
-e CHRONOLOGER_ROLES="basic-reader query-reader writer" \
saaj/chronologer \
python -m chronologer -e production serve -u www-data -g www-data -m
并再次运行请求:
session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')
流处理程序将产生:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): httpbin.org:443
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
DEBUG:httplogger:HTTP roundtrip
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
DEBUG:httplogger:HTTP roundtrip
现在,如果你打开http://localhost:8080/(用户名使用“logger”,基本身份验证弹出窗口使用空密码)并单击“Open”按钮,你应该会看到如下内容:
只是改进这个答案
这就是它对我的工作方式:
import logging
import sys
import requests
import textwrap
root = logging.getLogger('httplogger')
def logRoundtrip(response, *args, **kwargs):
extra = {'req': response.request, 'res': response}
root.debug('HTTP roundtrip', extra=extra)
class HttpFormatter(logging.Formatter):
def _formatHeaders(self, d):
return '\n'.join(f'{k}: {v}' for k, v in d.items())
def formatMessage(self, record):
result = super().formatMessage(record)
if record.name == 'httplogger':
result += textwrap.dedent('''
---------------- request ----------------
{req.method} {req.url}
{reqhdrs}
{req.body}
---------------- response ----------------
{res.status_code} {res.reason} {res.url}
{reshdrs}
{res.text}
''').format(
req=record.req,
res=record.res,
reqhdrs=self._formatHeaders(record.req.headers),
reshdrs=self._formatHeaders(record.res.headers),
)
return result
formatter = HttpFormatter('{asctime} {levelname} {name} {message}', style='{')
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)
root.addHandler(handler)
root.setLevel(logging.DEBUG)
session = requests.Session()
session.hooks['response'].append(logRoundtrip)
session.get('http://httpbin.org')
我正在使用 python 3.4,请求 2.19.1:
'urllib3' 是现在要获取的记录器(不再是 'requests.packages.urllib3')。在不设置 http.client.HTTPConnection.debuglevel 的情况下仍会发生基本日志记录
我正在使用一个logger_config.yaml
文件来配置我的日志记录,并让这些日志显示出来,我所要做的就是在disable_existing_loggers: False
它的末尾添加一个。
我的日志记录设置相当广泛且令人困惑,所以我什至不知道在这里解释它的好方法,但如果有人还使用 YAML 文件来配置他们的日志记录,这可能会有所帮助。
https://docs.python.org/3/howto/logging.html#configuring-logging