0

我正在创建一个嵌套的 json,并将其存储在一个列表对象中。这是我的代码,它按预期获得了正确的分层 json。

样本数据:

在此处输入图像描述

datasource,datasource_cnt,category,category_cnt,subcategory,subcategory_cnt 劳动统计局,44,就业和工资,44,就业和工资,44

import pandas as pd
df=pd.read_csv('queryhive16273.csv')
def split_df(df):
   for (vendor, count), df_vendor in df.groupby(["datasource", "datasource_cnt"]):
       yield {
           "vendor_name": vendor,
           "count": count,
           "categories": list(split_category(df_vendor))
       }

def split_category(df_vendor):
   for (category, count), df_category in df_vendor.groupby(
       ["category", "category_cnt"]
   ):
       yield {
           "name": category,
           "count": count,
           "subCategories": list(split_subcategory(df_category)),
       }

def split_subcategory(df_category):
   for (subcategory, count), df_subcategory in df_category.groupby(
       ["subcategory", "subcategory_cnt"]
   ):
       yield {
           "count": count,
           "name": subcategory,
             }


abc=list(split_df(df))

abc 包含如下所示的数据。这是预期的结果。

[{
    'count': 44,
    'vendor_name': 'Bureau of Labor Statistics',
    'categories': [{
        'count': 44,
        'name': 'Employment and wages',
        'subCategories': [{
            'count': 44,
            'name': 'Employment and wages'
        }]
    }]
}]

现在我试图将它存储到一个 json 文件中。

with open('your_file2.json', 'w') as f:
    for item in abc:
       f.write("%s\n" % item)
        #f.write(abc)

问题来了。这会以这种方式写入数据(请参阅下文),这不是有效的 json 格式。如果我尝试使用 json 转储,它会给出“json 序列化错误”

你能帮我吗?

{
    'count': 44,
    'vendor_name': 'Bureau of Labor Statistics',
    'categories': [{
        'count': 44,
        'name': 'Employment and wages',
        'subCategories': [{
            'count': 44,
            'name': 'Employment and wages'
        }]
    }]
}

预期结果 :

[{
    "count": 44,
    "vendor_name": "Bureau of Labor Statistics",
    "categories": [{
        "count": 44,
        "name": "Employment and wages",
        "subCategories": [{
            "count": 44,
            "name": "Employment and wages"
        }]
    }]
}]
4

2 回答 2

1

Using your data and PSL json gives me:

TypeError: Object of type 'int64' is not JSON serializable

Which just means some numpy object is living in your nested structure and does not have an encode method to convert it for JSON serialization.

Forcing encode to use string conversion when it lacks in the object itself is enough to make your code works:

import io
d = io.StringIO("datasource,datasource_cnt,category,category_cnt,subcategory,subcategory_cnt\nBureau of Labor Statistics,44,Employment and wages,44,Employment and wages,44")
df=pd.read_csv(d)

abc=list(split_df(df))

import json
json.dumps(abc, default=str)

It returns a valid JSON (but with int converted into str):

'[{"vendor_name": "Bureau of Labor Statistics", "count": "44", "categories": [{"name": "Employment and wages", "count": "44", "subCategories": [{"count": "44", "name": "Employment and wages"}]}]}]'

If it does not suit your needs, then use a dedicated Encoder:

import numpy as np
class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.int64):
            return int(obj)
        return json.JSONEncoder.default(self, obj)

json.dumps(abc, cls=MyEncoder)

This returns the requested JSON:

'[{"vendor_name": "Bureau of Labor Statistics", "count": 44, "categories": [{"name": "Employment and wages", "count": 44, "subCategories": [{"count": 44, "name": "Employment and wages"}]}]}]'

Another option is to directly convert your data before encoding:

def split_category(df_vendor):
   for (category, count), df_category in df_vendor.groupby(
       ["category", "category_cnt"]
   ):
       yield {
           "name": category,
           "count": int(count), # Cast here before encoding
           "subCategories": list(split_subcategory(df_category)),
       }
于 2018-12-05T07:27:07.037 回答
0
import json

data = [{
    'count': 44,
    'vendor_name': 'Bureau of Labor Statistics',
    'categories': [{
        'count': 44,
        'name': 'Employment and wages',
        'subCategories': [{
            'count': 44,
            'name': 'Employment and wages'
        }]
    }]
}]

with open('your_file2.json', 'w') as f:
    json.dump(data, f, indent=2)

生成一个有效的 JSON 文件:

[
  {
    "count": 44,
    "vendor_name": "Bureau of Labor Statistics",
    "categories": [
      {
        "count": 44,
        "name": "Employment and wages",
        "subCategories": [
          {
            "count": 44,
            "name": "Employment and wages"
          }
        ]
      }
    ]
  }
]
于 2018-12-05T07:35:42.357 回答