0

我有一组 ndJOSN 数据集,如下所示:

   {'ADDRESS_CITY': 'Whittier', 'ADDRESS_LINE_1': '905 Greenleaf Avenue', 'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '90402',},
   {'ADDRESS_CITY': 'Cedar Falls', 'ADDRESS_LINE_1': '93323 Maplewood Dr', 'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '95014'}

我需要将上面的值传递给 api 请求,特别是下面格式的正文。

data=[
        {
            "addressee":"Greenleaf Avenue",
            "street":"905 Greenleaf Avenue",
            "city":"Whittier",
            "state":"CA",
            "zipcode":"90402",
            
        },
        {
            "addressee":"93323",
            "street":"Maplewood Dr",
            "city":"Cedar Falls",
            "state":"CA",
            "zipcode":"95014",
        
        }
]

如您所见,密钥不同,因此我需要更改密钥以与正确的数据对齐并使用新的密钥名称传递它们(即 address_line_1 发送给收件人) - 此请求中将有 10k 个地址.

在我的第一个示例中我没有注意到它,但是每个地址都有一个关联的 ID - 我必须删除才能发出请求,然后重新添加。所以我最终用下面的方法解决了 - 任何更多的 Pythonic,这些感觉对我来说不是那么雄辩……?

addresses = ndjson.loads(addresses)
data = json.loads(json.dumps(addresses).replace('"ADDRESS_CITY"','"city"').replace('"ADDRESS_LINE_1"','"street"').replace('"ADDRESS_STATE"','"state"').replace('"ADDRESS_ZIP"','"zipcode"'))
ids = []
for i in data:
    i['candidates'] = 1
    ids.append(i["ID"])
    del i["ID"]

response = requests.request("POST", url, json=data)

resp_data = response.json()

a = 0
for i in resp_data:
    i['ID'] = ids[a]
    x = i['ID'] = ids[a]
    a = a + 1
4

2 回答 2

2

如果您想让自己的事情变得更容易一些,我建议您使用数据类来对输入数据进行建模。这样做的主要好处是您可以.对属性使用点访问,并且您不需要使用具有动态键的字典。您还可以从类型提示中受益,因此您的 IDE 也应该能够更好地为您提供帮助。

在这种情况下,我建议将它与一个 JSON 序列化库(例如dataclass-wizard )配对,它实际上完美地支持这个用例。从最新版本 - v0.15.0 开始,它还应该支持从序列化/转储过程中排除字段。

这是我放在一起的一个简单示例,它使用上面所需的键映射:

import json
from dataclasses import dataclass, field
# note: for python 3.9+, you can import this from `typing` instead
from typing_extensions import Annotated

from dataclass_wizard import JSONWizard, json_key


@dataclass
class AddressInfo(JSONWizard):
    """
    AddressInfo dataclass

    """
    city: Annotated[str, json_key('ADDRESS_CITY')]
    street: Annotated[str, json_key('ADDRESS_LINE_1')]
    state: Annotated[str, json_key('ADDRESS_STATE')]

    # pass `dump=False`, so we exclude the field in serialization.
    id: Annotated[int, json_key('ID', dump=False)]

    # you could also annotate the below like `Union[str, int]`
    # if you want to retain it as a string.
    zipcode: Annotated[int, json_key('ADDRESS_ZIP')]

    # exclude this field from the constructor (and from the
    # de-serialization process)
    candidates: int = field(default=1, init=False)

和上面的示例用法:

input_obj = [{'ADDRESS_CITY': 'Whittier', 'ADDRESS_LINE_1': '905 Greenleaf Avenue',
              'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '90402',
              'ID': 111},
             {'ADDRESS_CITY': 'Cedar Falls', 'ADDRESS_LINE_1': '93323 Maplewood Dr',
              'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '95014',
              'ID': 222}]

addresses = AddressInfo.from_list(input_obj)

print('-- Addresses')
for a in addresses:
    print(repr(a))

out_list = [a.to_dict() for a in addresses]

print('-- To JSON')
print(json.dumps(out_list, indent=2))

# alternatively, with the latest version (0.15.1)
# print(AddressInfo.list_to_json(addresses, indent=2))

注意:您仍然可以id正常访问每个地址的 ,即使 JSON 结果中省略了此字段。

于 2021-09-30T05:37:48.660 回答
1

使用字典来翻译它们:

translations = {
"ADDRESS_CITY": "city"} # etc
input_data = ... # your data here
data = [{translations[k]: v for k, v in row.items()} for row in input_data]
于 2021-09-29T22:19:58.330 回答