python - 使用 Python 编辑 HTML 代码时出现 UnicodeDecodeError

Question

我mitmproxy用来操作网页的返回 HTML 代码。当我对该 HTML 代码使用命令时，我得到了UnicodeDecodeError.

我尝试做任何事情，并在这里阅读任何帖子，但仍然没有任何效果。

我已经尝试过的许多事情的两个例子：

msg.response.content = unicode(msg.response.content, errors='ignore'))
msg.response.content = msg.response.content.decode('utf8').encode('ascii', errors='ignore'))

我该如何处理？

score 0 · Accepted Answer

为确保正确解码，您需要在 HTML 页面的源代码中查找类似<meta charset="utf-8">或<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">. charset 值是页面所说的正在使用的编码。

如果运行type(msg.response.content)返回类型是 str，那么您需要运行msg.response.content = msg.resposne.content.decode(u'utf-8')“utf-8”是页面说它正在使用的编码。这也可能是 ISO-8859-1 或 windows-1251 或 ASCII 之类的东西。

score 0 · Accepted Answer

尝试使用mitmproxy.flow.decoded上下文管理器，如下所示：

from mitmproxy.flow import decoded

def response(context, flow):
    with decoded(flow.response):
        flow.response.content = flow.response.content.replace("Google", "Noogle")

从来源：

一个上下文管理器，它对请求、响应或错误进行解码，然后在执行块后使用相同的编码对其进行重新编码。

例子：
   with decoded(request):
        request.content = request.content.replace("foo", "bar")

注意：我在 Ubuntu 14.04 上使用了 mitmproxy。

python - 使用 Python 编辑 HTML 代码时出现 UnicodeDecodeError

2 回答 2

Related

Reference