0

我有以下使用selenium-wire记录所有请求的完整工作示例代码。

import os
import sys
import json
from seleniumwire import webdriver

driver = webdriver.Chrome()

driver.get("http://www.google.com")

list_requests = []
for request in driver.requests:
    req = {
        "method": request.method,
        "url": request.url,
        "body": request.body.decode(), # to avoid json error
        "headers": {k:str(v) for k,v in request.headers.__dict__.items()} # to avoid json error
    }
  
    if request.response:
        resp = {
            "status_code": request.response.status_code,
            "reason": request.response.reason,
            "body": request.response.body.decode(), # ???
            "headers": {k:str(v) for k,v in request.response.headers.__dict__.items()} # to avoid json error
        }
        req["response"] = resp
    list_requests.append(req)

with open(f"test.json", "w") as outfile:
    json.dump(list_requests, outfile)

但是,响应正文的解码会产生错误

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1: invalid start byte

并且没有尝试解码响应正文,我得到一个错误

TypeError: Object of type bytes is not JSON serializable

我不关心编码,我只想能够以某种方式将“正文”写入 json 文件。如果需要,可以删除有问题的字节/字符,我不在乎。

任何想法如何解决这个问题?

4

1 回答 1

0

我使用了下一种方法来some_key从 json 响应中提取一些字段 ( ):

from gzip import decompress
import json

some_key = None

for request in driver.requests:
    if request.response:
        if request.method == 'POST':
            print(request.method + ' ' + request.url)
            try:
                # try to parse the json response to extract the data
                data = json.loads(request.response.body)
                print('parsed as json')
                if 'some_key' in data:
                    some_key = data['some_key']
            except UnicodeDecodeError:
                try:
                    # decompress on UnicodeDecodeError and parse the json response to extract the data
                    data = json.loads(decompress(request.response.body))
                    print('decompressed and parsed as json')
                    if 'some_key' in data:
                        some_key = data['some_key']
                except json.decoder.JSONDecodeError:
                    data = request.response.body
                    print('decompressed and not parsed')
print(data)
print(some_key)

gzip.decompress帮助我UnicodeDecodeError

希望这会有所帮助。

于 2022-02-18T08:03:34.933 回答