255

我有一个要转换为 CSV 文件的 JSON 文件。我怎样才能用 Python 做到这一点?

我试过了:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    csv_file.writerow(item)

f.close()

但是,它没有用。我正在使用 Django,我收到的错误是:

`file' object has no attribute 'writerow'`

然后我尝试了以下方法:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    f.writerow(item)  # ← changed

f.close()

然后我得到错误:

`sequence expected`

示例 json 文件:

[{
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    }, {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    }, {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }, {
        "pk": 4,
        "model": "auth.permission",
        "fields": {
            "codename": "add_group",
            "name": "Can add group",
            "content_type": 2
        }
    }, {
        "pk": 10,
        "model": "auth.permission",
        "fields": {
            "codename": "add_message",
            "name": "Can add message",
            "content_type": 4
        }
    }
]
4

27 回答 27

208

使用pandas 这就像使用两个命令一样简单!

df = pd.read_json()

read_json将 JSON 字符串转换为 pandas 对象(序列或数据框)。然后:

df.to_csv()

它可以返回字符串或直接写入 csv 文件。请参阅to_csv的文档。

基于之前答案的冗长,我们都应该感谢 pandas 的捷径。

对于非结构化 JSON,请参阅此答案

编辑:有人要求一个工作最小的例子:

import pandas as pd

with open('jsonfile.json', encoding='utf-8') as inputfile:
    df = pd.read_json(inputfile)

df.to_csv('csvfile.csv', encoding='utf-8', index=False)
于 2016-05-18T18:19:22.627 回答
146

首先,您的 JSON 具有嵌套对象,因此通常不能直接转换为 CSV。您需要将其更改为以下内容:

{
    "pk": 22,
    "model": "auth.permission",
    "codename": "add_logentry",
    "content_type": 8,
    "name": "Can add log entry"
},
......]

这是我从中生成 CSV 的代码:

import csv
import json

x = """[
    {
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    },
    {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    },
    {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }
]"""

x = json.loads(x)

f = csv.writer(open("test.csv", "wb+"))

# Write CSV Header, If you dont need that, remove this line
f.writerow(["pk", "model", "codename", "name", "content_type"])

for x in x:
    f.writerow([x["pk"],
                x["model"],
                x["fields"]["codename"],
                x["fields"]["name"],
                x["fields"]["content_type"]])

你会得到如下输出:

pk,model,codename,name,content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
于 2009-12-09T06:56:38.343 回答
113

我假设您的 JSON 文件将解码为字典列表。首先,我们需要一个将 JSON 对象展平的函数:

def flattenjson(b, delim):
    val = {}
    for i in b.keys():
        if isinstance(b[i], dict):
            get = flattenjson(b[i], delim)
            for j in get.keys():
                val[i + delim + j] = get[j]
        else:
            val[i] = b[i]
            
    return val

在您的 JSON 对象上运行此代码段的结果:

flattenjson({
    "pk": 22, 
    "model": "auth.permission", 
    "fields": {
      "codename": "add_message", 
      "name": "Can add message", 
      "content_type": 8
    }
  }, "__")

{
    "pk": 22, 
    "model": "auth.permission", 
    "fields__codename": "add_message", 
    "fields__name": "Can add message", 
    "fields__content_type": 8
}

将此函数应用于 JSON 对象的输入数组中的每个 dict 后:

input = map(lambda x: flattenjson( x, "__" ), input)

并找到相关的列名:

columns = [x for row in input for x in row.keys()]
columns = list(set(columns))

通过 csv 模块运行它并不难:

with open(fname, 'wb') as out_file:
    csv_w = csv.writer(out_file)
    csv_w.writerow(columns)

    for i_r in input:
        csv_w.writerow(map(lambda x: i_r.get(x, ""), columns))

我希望这有帮助!

于 2015-01-30T23:11:27.533 回答
39

JSON可以表示各种各样的数据结构——一个JS“对象”大致像一个Python dict(带有字符串键),一个JS“数组”大致像一个Python列表,你可以嵌套它们,只要最后一个“叶”元素是数字或字符串。

CSV 本质上只能表示一个二维表——可选地带有第一行“标题”,即“列名”,这可以使表可解释为字典列表,而不是正常解释,列表列表(同样,“叶子”元素可以是数字或字符串)。

因此,在一般情况下,您不能将任意 JSON 结构转换为 CSV。在一些特殊情况下,您可以(没有进一步嵌套的数组;所有具有完全相同键的对象数组)。哪种特殊情况(如果有)适用于您的问题?解决方案的细节取决于您所拥有的特殊情况。鉴于您甚至没有提及适用哪一个的惊人事实,我怀疑您可能没有考虑约束,实际上没有可用的案例适用,并且您的问题无法解决。但请务必澄清!

于 2009-12-09T04:27:25.293 回答
36

将平面对象的任何 json 列表转换为 csv 的通用解决方案。

将 input.json 文件作为命令行的第一个参数传递。

import csv, json, sys

input = open(sys.argv[1])
data = json.load(input)
input.close()

output = csv.writer(sys.stdout)

output.writerow(data[0].keys())  # header row

for row in data:
    output.writerow(row.values())
于 2012-10-05T02:55:35.287 回答
27

假设您的 JSON 数据位于名为data.json.

import json
import csv

with open("data.json") as file:
    data = json.load(file)

with open("data.csv", "w") as file:
    csv_file = csv.writer(file)
    for item in data:
        fields = list(item['fields'].values())
        csv_file.writerow([item['pk'], item['model']] + fields)
于 2009-12-09T04:23:05.240 回答
24

使用json_normalize来自pandas

  • 在名为test.json.
  • encoding='utf-8'此处已使用,但对于其他情况可能不需要。
  • 以下代码利用了该pathlib库。
    • .open是一种方法pathlib
    • 也适用于非 Windows 路径。
  • 用于pandas.to_csv(...)将数据保存到 csv 文件。
import pandas as pd
# As of Pandas 1.01, json_normalize as pandas.io.json.json_normalize is deprecated and is now exposed in the top-level namespace.
# from pandas.io.json import json_normalize
from pathlib import Path
import json

# set path to file
p = Path(r'c:\some_path_to_file\test.json')

# read json
with p.open('r', encoding='utf-8') as f:
    data = json.loads(f.read())

# create dataframe
df = pd.json_normalize(data)

# dataframe view
 pk            model  fields.codename           fields.name  fields.content_type
 22  auth.permission     add_logentry     Can add log entry                    8
 23  auth.permission  change_logentry  Can change log entry                    8
 24  auth.permission  delete_logentry  Can delete log entry                    8
  4  auth.permission        add_group         Can add group                    2
 10  auth.permission      add_message       Can add message                    4

# save to csv
df.to_csv('test.csv', index=False, encoding='utf-8')

CSV 输出:

pk,model,fields.codename,fields.name,fields.content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
4,auth.permission,add_group,Can add group,2
10,auth.permission,add_message,Can add message,4

更多嵌套 JSON 对象的资源:

于 2019-10-31T17:12:24.437 回答
22

使用起来会很方便csv.DictWriter(),具体实现如下:

def read_json(filename):
    return json.loads(open(filename).read())
def write_csv(data,filename):
    with open(filename, 'w+') as outf:
        writer = csv.DictWriter(outf, data[0].keys())
        writer.writeheader()
        for row in data:
            writer.writerow(row)
# implement
write_csv(read_json('test.json'), 'output.csv')

请注意,这假定所有 JSON 对象都具有相同的字段。

这是可以帮助您的参考。

于 2016-12-01T09:17:55.563 回答
6

我在使用Dan 提出的解决方案时遇到了麻烦,但这对我有用:

import json
import csv 

f = open('test.json')
data = json.load(f)
f.close()

f=csv.writer(open('test.csv','wb+'))

for item in data:
  f.writerow([item['pk'], item['model']] + item['fields'].values())

其中“test.json”包含以下内容:

[ 
{"pk": 22, "model": "auth.permission", "fields": 
  {"codename": "add_logentry", "name": "Can add log entry", "content_type": 8 } }, 
{"pk": 23, "model": "auth.permission", "fields": 
  {"codename": "change_logentry", "name": "Can change log entry", "content_type": 8 } }, {"pk": 24, "model": "auth.permission", "fields": 
  {"codename": "delete_logentry", "name": "Can delete log entry", "content_type": 8 } }
]
于 2011-05-21T17:30:00.110 回答
5

亚历克的回答很好,但在有多层嵌套的情况下不起作用。这是一个支持多级嵌套的修改版本。如果嵌套对象已经指定了自己的键(例如 Firebase Analytics / BigTable / BigQuery 数据),它还会使标题名称更好一些:

"""Converts JSON with nested fields into a flattened CSV file.
"""

import sys
import json
import csv
import os

import jsonlines

from orderedset import OrderedSet

# from https://stackoverflow.com/a/28246154/473201
def flattenjson( b, prefix='', delim='/', val=None ):
  if val is None:
    val = {}

  if isinstance( b, dict ):
    for j in b.keys():
      flattenjson(b[j], prefix + delim + j, delim, val)
  elif isinstance( b, list ):
    get = b
    for j in range(len(get)):
      key = str(j)

      # If the nested data contains its own key, use that as the header instead.
      if isinstance( get[j], dict ):
        if 'key' in get[j]:
          key = get[j]['key']

      flattenjson(get[j], prefix + delim + key, delim, val)
  else:
    val[prefix] = b

  return val

def main(argv):
  if len(argv) < 2:
    raise Error('Please specify a JSON file to parse')

  print "Loading and Flattening..."
  filename = argv[1]
  allRows = []
  fieldnames = OrderedSet()
  with jsonlines.open(filename) as reader:
    for obj in reader:
      # print 'orig:\n'
      # print obj
      flattened = flattenjson(obj)
      #print 'keys: %s' % flattened.keys()
      # print 'flattened:\n'
      # print flattened
      fieldnames.update(flattened.keys())
      allRows.append(flattened)

  print "Exporting to CSV..."
  outfilename = filename + '.csv'
  count = 0
  with open(outfilename, 'w') as file:
    csvwriter = csv.DictWriter(file, fieldnames=fieldnames)
    csvwriter.writeheader()
    for obj in allRows:
      # print 'allRows:\n'
      # print obj
      csvwriter.writerow(obj)
      count += 1

  print "Wrote %d rows" % count



if __name__ == '__main__':
  main(sys.argv)
于 2019-07-27T02:12:45.733 回答
4

如前面的答案所述,将 json 转换为 csv 的困难是因为 json 文件可以包含嵌套字典,因此是多维数据结构,而 csv 是 2D 数据结构。但是,将多维结构转换为 csv 的一个好方法是让多个 csv 与主键绑定在一起。

在您的示例中,第一个 csv 输出将列 "pk","model","fields" 作为您的列。“pk”和“model”的值很容易获得,但是因为“fields”列包含一个字典,它应该是它自己的 csv,并且因为“codename”似乎是主键,你可以用作输入用于“字段”以完成第一个 csv。第二个 csv 包含来自“字段”列的字典,其中代号作为主键,可用于将 2 个 csv 绑定在一起。

这是您的 json 文件的解决方案,可将嵌套字典转换为 2 个 csv。

import csv
import json

def readAndWrite(inputFileName, primaryKey=""):
    input = open(inputFileName+".json")
    data = json.load(input)
    input.close()

    header = set()

    if primaryKey != "":
        outputFileName = inputFileName+"-"+primaryKey
        if inputFileName == "data":
            for i in data:
                for j in i["fields"].keys():
                    if j not in header:
                        header.add(j)
    else:
        outputFileName = inputFileName
        for i in data:
            for j in i.keys():
                if j not in header:
                    header.add(j)

    with open(outputFileName+".csv", 'wb') as output_file:
        fieldnames = list(header)
        writer = csv.DictWriter(output_file, fieldnames, delimiter=',', quotechar='"')
        writer.writeheader()
        for x in data:
            row_value = {}
            if primaryKey == "":
                for y in x.keys():
                    yValue = x.get(y)
                    if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list:
                        row_value[y] = str(yValue).encode('utf8')
                    elif type(yValue) != dict:
                        row_value[y] = yValue.encode('utf8')
                    else:
                        if inputFileName == "data":
                            row_value[y] = yValue["codename"].encode('utf8')
                            readAndWrite(inputFileName, primaryKey="codename")
                writer.writerow(row_value)
            elif primaryKey == "codename":
                for y in x["fields"].keys():
                    yValue = x["fields"].get(y)
                    if type(yValue) == int or type(yValue) == bool or type(yValue) == float or type(yValue) == list:
                        row_value[y] = str(yValue).encode('utf8')
                    elif type(yValue) != dict:
                        row_value[y] = yValue.encode('utf8')
                writer.writerow(row_value)

readAndWrite("data")
于 2014-02-18T01:19:24.930 回答
3

令人惊讶的是,我发现到目前为止,这里发布的答案都没有正确处理所有可能的情况(例如,嵌套字典、嵌套列表、无值等)。

此解决方案应适用于所有场景:

def flatten_json(json):
    def process_value(keys, value, flattened):
        if isinstance(value, dict):
            for key in value.keys():
                process_value(keys + [key], value[key], flattened)
        elif isinstance(value, list):
            for idx, v in enumerate(value):
                process_value(keys + [str(idx)], v, flattened)
        else:
            flattened['__'.join(keys)] = value

    flattened = {}
    for key in json.keys():
        process_value([key], json[key], flattened)
    return flattened
于 2018-09-18T20:22:33.890 回答
3

这不是一个非常聪明的方法,但我遇到了同样的问题,这对我有用:

import csv

f = open('data.json')
data = json.load(f)
f.close()

new_data = []

for i in data:
   flat = {}
   names = i.keys()
   for n in names:
      try:
         if len(i[n].keys()) > 0:
            for ii in i[n].keys():
               flat[n+"_"+ii] = i[n][ii]
      except:
         flat[n] = i[n]
   new_data.append(flat)  

f = open(filename, "r")
writer = csv.DictWriter(f, new_data[0].keys())
writer.writeheader()
for row in new_data:
   writer.writerow(row)
f.close()
于 2017-11-16T19:56:55.660 回答
3

我知道这个问题已经很久没有问过了,但我想我可能会添加到其他人的答案中,并分享一篇我认为以非常简洁的方式解释解决方案的博客文章。

这是链接

打开要写入的文件

employ_data = open('/tmp/EmployData.csv', 'w')

创建 csv 写入器对象

csvwriter = csv.writer(employ_data)
count = 0
for emp in emp_data:
      if count == 0:
             header = emp.keys()
             csvwriter.writerow(header)
             count += 1
      csvwriter.writerow(emp.values())

确保关闭文件以保存内容

employ_data.close()
于 2017-01-26T17:21:39.930 回答
2

这工作相对较好。它将 json 展平以将其写入 csv 文件。嵌套元素被管理:)

那是为python 3

import json

o = json.loads('your json string') # Be careful, o must be a list, each of its objects will make a line of the csv.

def flatten(o, k='/'):
    global l, c_line
    if isinstance(o, dict):
        for key, value in o.items():
            flatten(value, k + '/' + key)
    elif isinstance(o, list):
        for ov in o:
            flatten(ov, '')
    elif isinstance(o, str):
        o = o.replace('\r',' ').replace('\n',' ').replace(';', ',')
        if not k in l:
            l[k]={}
        l[k][c_line]=o

def render_csv(l):
    ftime = True

    for i in range(100): #len(l[list(l.keys())[0]])
        for k in l:
            if ftime :
                print('%s;' % k, end='')
                continue
            v = l[k]
            try:
                print('%s;' % v[i], end='')
            except:
                print(';', end='')
        print()
        ftime = False
        i = 0

def json_to_csv(object_list):
    global l, c_line
    l = {}
    c_line = 0
    for ov in object_list : # Assumes json is a list of objects
        flatten(ov)
        c_line += 1
    render_csv(l)

json_to_csv(o)

请享用。

于 2016-04-24T02:17:23.477 回答
2

这是对@MikeRepass 答案的修改。此版本将 CSV 写入文件,适用于 Python 2 和 Python 3。

import csv,json
input_file="data.json"
output_file="data.csv"
with open(input_file) as f:
    content=json.load(f)
try:
    context=open(output_file,'w',newline='') # Python 3
except TypeError:
    context=open(output_file,'wb') # Python 2
with context as file:
    writer=csv.writer(file)
    writer.writerow(content[0].keys()) # header row
    for row in content:
        writer.writerow(row.values())
于 2020-04-08T22:58:44.693 回答
2

我解决这个问题的简单方法:

创建一个新的 Python 文件,如:json_to_csv.py

添加此代码:

import csv, json, sys
#if you are not using utf-8 files, remove the next line
sys.setdefaultencoding("UTF-8")
#check if you pass the input file and output file
if sys.argv[1] is not None and sys.argv[2] is not None:

    fileInput = sys.argv[1]
    fileOutput = sys.argv[2]

    inputFile = open(fileInput)
    outputFile = open(fileOutput, 'w')
    data = json.load(inputFile)
    inputFile.close()

    output = csv.writer(outputFile)

    output.writerow(data[0].keys())  # header row

    for row in data:
        output.writerow(row.values())

添加此代码后,保存文件并在终端运行:

python json_to_csv.py input.txt output.csv

我希望这对你有帮助。

再见!

于 2016-12-13T02:05:02.520 回答
2

尝试这个

import csv, json, sys

input = open(sys.argv[1])
data = json.load(input)
input.close()

output = csv.writer(sys.stdout)

output.writerow(data[0].keys())  # header row

for item in data:
    output.writerow(item.values())
于 2019-03-07T07:47:53.077 回答
2

此代码适用于任何给定的 json 文件

# -*- coding: utf-8 -*-
"""
Created on Mon Jun 17 20:35:35 2019
author: Ram
"""

import json
import csv

with open("file1.json") as file:
    data = json.load(file)



# create the csv writer object
pt_data1 = open('pt_data1.csv', 'w')
csvwriter = csv.writer(pt_data1)

count = 0

for pt in data:

      if count == 0:

             header = pt.keys()

             csvwriter.writerow(header)

             count += 1

      csvwriter.writerow(pt.values())

pt_data1.close()
于 2019-06-17T15:27:43.457 回答
1
import json,csv
t=''
t=(type('a'))
json_data = []
data = None
write_header = True
item_keys = []
try:
with open('kk.json') as json_file:
    json_data = json_file.read()

    data = json.loads(json_data)
except Exception as e:
    print( e)

with open('bar.csv', 'at') as csv_file:
    writer = csv.writer(csv_file)#, quoting=csv.QUOTE_MINIMAL)
    for item in data:
        item_values = []
        for key in item:
            if write_header:
                item_keys.append(key)
            value = item.get(key, '')
            if (type(value)==t):
                item_values.append(value.encode('utf-8'))
            else:
                item_values.append(value)
        if write_header:
            writer.writerow(item_keys)
            write_header = False
        writer.writerow(item_values)
于 2019-03-07T07:42:38.953 回答
1

修改了 Alec McGail 的答案以支持带有列表的 JSON

    def flattenjson(self, mp, delim="|"):
            ret = []
            if isinstance(mp, dict):
                    for k in mp.keys():
                            csvs = self.flattenjson(mp[k], delim)
                            for csv in csvs:
                                    ret.append(k + delim + csv)
            elif isinstance(mp, list):
                    for k in mp:
                            csvs = self.flattenjson(k, delim)
                            for csv in csvs:
                                    ret.append(csv)
            else:
                    ret.append(mp)

            return ret

谢谢!

于 2016-07-12T11:56:02.380 回答
1

如果我们考虑以下将 json 格式文件转换为 csv 格式文件的示例。

{
 "item_data" : [
      {
        "item": "10023456",
        "class": "100",
        "subclass": "123"
      }
      ]
}

下面的代码会将 json 文件 ( data3.json ) 转换为 csv 文件 ( data3.csv )。

import json
import csv
with open("/Users/Desktop/json/data3.json") as file:
    data = json.load(file)
    file.close()
    print(data)

fname = "/Users/Desktop/json/data3.csv"

with open(fname, "w", newline='') as file:
    csv_file = csv.writer(file)
    csv_file.writerow(['dept',
                       'class',
                       'subclass'])
    for item in data["item_data"]:
         csv_file.writerow([item.get('item_data').get('dept'),
                            item.get('item_data').get('class'),
                            item.get('item_data').get('subclass')])

上述代码已在本地安装的pycharm中执行,并成功将json文件转换为csv文件。希望这有助于转换文件。

于 2020-02-02T20:06:20.707 回答
0

由于数据似乎采用字典格式,因此您似乎应该实际使用 csv.DictWriter() 来实际输出具有适当标题信息的行。这应该允许转换处理得更容易一些。然后,fieldnames 参数将正确设置顺序,而作为标题的第一行的输出将允许稍后由 csv.DictReader() 读取和处理它。

例如,Mike Repass 使用

output = csv.writer(sys.stdout)

output.writerow(data[0].keys())  # header row

for row in data:
  output.writerow(row.values())

但是只需将初始设置更改为 output = csv.DictWriter(filesetting, fieldnames=data[0].keys())

请注意,由于未定义字典中元素的顺序,您可能必须显式创建字段名条目。一旦你这样做了,作家就会工作。然后写入工作如最初所示。

于 2014-02-10T14:34:19.607 回答
0

不幸的是,我没有足够的声誉来为惊人的@Alec McGail 答案做出一点贡献。我使用的是 Python3,我需要将地图转换为 @Alexis R 评论之后的列表。

另外,我发现 csv 编写器正在向文件中添加一个额外的 CR(我在 csv 文件中包含数据的每一行都有一个空行)。在@Jason R. Coombs 对此线程的回答之后,解决方案非常简单: Python 中的 CSV 添加额外的回车符

您只需将 lineterminator='\n' 参数添加到 csv.writer 即可。这将是:csv_w = csv.writer( out_file, lineterminator='\n' )

于 2018-06-29T09:42:51.663 回答
0

您可以使用此代码将 json 文件转换为 csv 文件阅读文件后,我将对象转换为 pandas 数据框,然后将其保存到 CSV 文件

import os
import pandas as pd
import json
import numpy as np

data = []
os.chdir('D:\\Your_directory\\folder')
with open('file_name.json', encoding="utf8") as data_file:    
     for line in data_file:
        data.append(json.loads(line))

dataframe = pd.DataFrame(data)        
## Saving the dataframe to a csv file
dataframe.to_csv("filename.csv", encoding='utf-8',index= False)
于 2018-11-26T23:40:40.577 回答
0

我可能会迟到,但我想,我已经处理了类似的问题。我有一个看起来像这样的 json 文件

JSON 文件结构

我只想从这些 json 文件中提取几个键/值。因此,我编写了以下代码来提取相同的内容。

    """json_to_csv.py
    This script reads n numbers of json files present in a folder and then extract certain data from each file and write in a csv file.
    The folder contains the python script i.e. json_to_csv.py, output.csv and another folder descriptions containing all the json files.
"""

import os
import json
import csv


def get_list_of_json_files():
    """Returns the list of filenames of all the Json files present in the folder
    Parameter
    ---------
    directory : str
        'descriptions' in this case
    Returns
    -------
    list_of_files: list
        List of the filenames of all the json files
    """

    list_of_files = os.listdir('descriptions')  # creates list of all the files in the folder

    return list_of_files


def create_list_from_json(jsonfile):
    """Returns a list of the extracted items from json file in the same order we need it.
    Parameter
    _________
    jsonfile : json
        The json file containing the data
    Returns
    -------
    one_sample_list : list
        The list of the extracted items needed for the final csv
    """

    with open(jsonfile) as f:
        data = json.load(f)

    data_list = []  # create an empty list

    # append the items to the list in the same order.
    data_list.append(data['_id'])
    data_list.append(data['_modelType'])
    data_list.append(data['creator']['_id'])
    data_list.append(data['creator']['name'])
    data_list.append(data['dataset']['_accessLevel'])
    data_list.append(data['dataset']['_id'])
    data_list.append(data['dataset']['description'])
    data_list.append(data['dataset']['name'])
    data_list.append(data['meta']['acquisition']['image_type'])
    data_list.append(data['meta']['acquisition']['pixelsX'])
    data_list.append(data['meta']['acquisition']['pixelsY'])
    data_list.append(data['meta']['clinical']['age_approx'])
    data_list.append(data['meta']['clinical']['benign_malignant'])
    data_list.append(data['meta']['clinical']['diagnosis'])
    data_list.append(data['meta']['clinical']['diagnosis_confirm_type'])
    data_list.append(data['meta']['clinical']['melanocytic'])
    data_list.append(data['meta']['clinical']['sex'])
    data_list.append(data['meta']['unstructured']['diagnosis'])
    # In few json files, the race was not there so using KeyError exception to add '' at the place
    try:
        data_list.append(data['meta']['unstructured']['race'])
    except KeyError:
        data_list.append("")  # will add an empty string in case race is not there.
    data_list.append(data['name'])

    return data_list


def write_csv():
    """Creates the desired csv file
    Parameters
    __________
    list_of_files : file
        The list created by get_list_of_json_files() method
    result.csv : csv
        The csv file containing the header only
    Returns
    _______
    result.csv : csv
        The desired csv file
    """

    list_of_files = get_list_of_json_files()
    for file in list_of_files:
        row = create_list_from_json(f'descriptions/{file}')  # create the row to be added to csv for each file (json-file)
        with open('output.csv', 'a') as c:
            writer = csv.writer(c)
            writer.writerow(row)
        c.close()


if __name__ == '__main__':
    write_csv()

我希望这将有所帮助。有关此代码如何工作的详细信息,您可以在此处查看

于 2019-09-05T12:04:05.960 回答
0

我已经尝试了很多建议的解决方案(Panda 也没有正确规范化我的 JSON),但是正确解析 JSON 数据的真正好的解决方案来自Max Berman

我写了一个改进来避免每一行都出现新列,并在解析过程中将其放到现有列中。如果仅存在一个数据,它还具有将值存储为字符串的效果,如果该列有更多值,则创建一个列表。

它需要一个 input.json 文件作为输入并输出一个 output.csv。

import json
import pandas as pd

def flatten_json(json):
    def process_value(keys, value, flattened):
        if isinstance(value, dict):
            for key in value.keys():
                process_value(keys + [key], value[key], flattened)
        elif isinstance(value, list):
            for idx, v in enumerate(value):
                process_value(keys, v, flattened)
                # process_value(keys + [str(idx)], v, flattened)
        else:
            key1 = '__'.join(keys)
            if not flattened.get(key1) is None:
                if isinstance(flattened[key1], list):
                    flattened[key1] = flattened[key1] + [value]
                else:
                    flattened[key1] = [flattened[key1]] + [value]
            else:
                flattened[key1] = value

    flattened = {}
    for key in json.keys():
        k = key
        # print("Key: " + k)
        process_value([key], json[key], flattened)
    return flattened

try:
    f = open("input.json", "r")
except:
    pass
y = json.loads(f.read())
flat = flatten_json(y)
text = json.dumps(flat)
df = pd.read_json(text)
df.to_csv('output.csv', index=False, encoding='utf-8')
于 2021-12-25T07:24:06.423 回答