0

我正在尝试将 Json 文件转换为 ndjson。我正在从 GCS(谷歌云存储)读取文件。样本数据:

{
  "Item1" : "INT",
  "Item2" : "INT",
  "Item3" : "text",
  "Item4" : "text",
  "Item5" : "Date"
}{
  "Item1" : "INT",
  "Item2" : "INT",
  "Item3" : "text",
  "Item4" : "text",
  "Item5" : "Date"
}{
  "Item1" : "INT",
  "Item2" : "INT",
  "Item3" : "text",
  "Item4" : "text",
  "Item5" : "Date"
}

以下是我的代码。

bucket = client.get_bucket('bucket name')
# Name of the object to be stored in the bucket
object_name_in_gcs_bucket = bucket.get_blob('file.json')
object_to_string = object_name_in_gcs_bucket.download_as_string()
#json_data = ndjson.loads(object_to_string)
json_list = [json.loads(row.decode('utf-8')) for row in object_to_string.split(b'\n') if row]

我收到的错误在 json_list: json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

所需输出:

{"Item1" : "INT","Item2" : "INT","Item3" : "text","Item4" : "text","Item5" : "Date"}
{"Item1" : "INT","Item2" : "INT","Item3" : "text","Item4" : "text","Item5" : "Date"}
{"Item1" : "INT","Item2" : "INT","Item3" : "text","Item4" : "text","Item5" : "Date"}
4

1 回答 1

1

我认为您的主要问题是您在行尾而不是右大括号上拆分。这是一个完成我认为您正在尝试的示例。

from json import loads, dumps

with open("test.json") as f:
  file_string = f.read()
  dicts = [loads(f"{x}}}".replace("\n","")) for x in file_string.split("}")[0:-1]]
  for d in dicts:
    print(d)

with open("new.json", "a+") as newf:
  for d in dicts:
    newf.write(f"{dumps(d)}\n")

输出:

[root@foohome]# ./test.py
{'Item1': 'INT', 'Item2': 'INT', 'Item3': 'text', 'Item4': 'text', 'Item5': 'Date'}
{'Item1': 'INT', 'Item2': 'INT', 'Item3': 'text', 'Item4': 'text', 'Item5': 'Date'}
{'Item1': 'INT', 'Item2': 'INT', 'Item3': 'text', 'Item4': 'text', 'Item5': 'Date'}
[root@foo home]# cat new.json
{"Item1": "INT", "Item2": "INT", "Item3": "text", "Item4": "text", "Item5": "Date"}
{"Item1": "INT", "Item2": "INT", "Item3": "text", "Item4": "text", "Item5": "Date"}
{"Item1": "INT", "Item2": "INT", "Item3": "text", "Item4": "text", "Item5": "Date"}
于 2020-12-10T20:17:13.263 回答