0

我正在尝试在 API 中查询某些数据,但我的查询可能会很长并导致服务器无法发回数据 ( 414 request-uri too large)。因此,我正在创建批处理以发送多个调用,目的是将每个调用的响应保存为 json,然后将它们读入pandas下一行以进行进一步的分析/操作。我的查询正在按预期构建,API 正在返回批量发送时请求的数据;然而,当我去写入文件时,并不是所有的数据都被写入,而当它是已经写入的相同数据时。

到目前为止,我的代码如下。我不能轻易说出我哪里出错了。有什么我应该做不同(或更好)的事情吗?下面我的代码是来自 API 的响应示例。

import pandas as pd
import requests
import json
import glob
import yaml

conf = "config.yml"

with open(conf) as f:
    config = yaml.safe_load(f)

# Certs to access API
cert = config['cert'] 
key = config['key']

# Data to append to API url
alist = ['123', '456', '789', 'abc', 'def', 'xyz', '123abc', 'input1, input2']

# URL too long, send data in batches

# Create batchs to send API requests
num_batches = 4
batch_size = int(len(alist)/num_batches)
batches = []

for i in range(0, len(alist), batch_size): 
    batches.append(alist[i:i + batch_size])

urlprefix = "https://test_url.com/"
urlsuffix = "=json?url_suffix"

# API call
for batch in batches:
    APIquery = ",".join(batch)
    url = urlprefix+APIquery+urlsuffix
    print(url)

    response = requests.get(url, data=json.dumps(url), cert=(cert,key))
    jsonResponse = response.json()
    print(jsonResponse)
    
    # Write data from each batch to json file
    for i in range(0,num_batches):
        with open(os.makedir(os.path.dirname("data/output"), exist_ok=True)+"/output_"+i+".json") as f:
            json.dumps(jsonResponse, f, indent=4)

df = pd.concat(map(pd.read_json, glob.glob('data/output/*.json')))
df.head()

示例响应:

[
    {
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
    },
    {
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
    },
]
4

1 回答 1

1

一方面,您似乎在写相同的响应 4 次:

# Write data from each batch to json file
for i in range(0,num_batches):
   with open(os.makedir(os.path.dirname("data/output"), exist_ok=True)+"/output_"+i+".json") as f:
       json.dumps(jsonResponse, f, indent=4)

应该是:

response_cnt = 0

for batch in batches:

    ...    

    # Write data from each batch to json file
    with open(os.makedir(os.path.dirname("data/output"), exist_ok=True)+"/output_"+response_cnt+".json") as f:
       json.dumps(jsonResponse, f, indent=4)

    response_cnt += 1

whereresponse_cnt是在for batch in batches:循环外声明并在每次迭代后递增的变量。

于 2021-08-08T01:06:38.130 回答