python - 正确读取 JSON 文件

Question

我正在尝试在 Python中读取 JSON 文件（BioRelEx 数据集： https ://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7）。JSON 文件是一个对象列表，每个句子一个。这就是我尝试这样做的方式：

 def _read(self, file_path):
        with open(cached_path(file_path), "r") as data_file:
            for line in data_file.readlines():
                if not line:
                    continue
                 items = json.loads(lines)
                 text = items["text"]
                 label = items.get("label")

我的代码在items = json.loads(line). 看起来数据没有像代码预期的那样格式化，但我该如何更改呢？

在此先感谢您的时间！

最好的，

朱莉娅

score 1 · Accepted Answer

由于json.load()您不需要阅读每一行，您可以执行以下任一操作：

import json

def open_json(path):
    with open(path, 'r') as file:
        return json.load(file)

data = open_json('./1.0alpha7.dev.json')

或者，更酷的是，您可以从 GitHub 获取 json 请求

import json
import requests

url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()

这些都将提供相同的输出。data变量将是一个字典列表，您可以在循环中对其进行迭代for并进行进一步处理。

score 0 · Accepted Answer

您的代码一次读取一行并将每一行单独解析为 JSON。除非文件的创建者以这种格式创建文件（因为它不太可能具有 .json 扩展名），否则这是行不通的，因为 JSON 不使用换行符来指示对象的结束。

而是将整个文件内容加载为 JSON，然后处理数组中的结果项。

def _read(self, file_path):
    with open(cached_path(file_path), "r") as data_file:
        data = json.load(data_file)
    for item in data:
        text = item["text"]

标签似乎隐藏在 item["interaction"] 中

python - 正确读取 JSON 文件

2 回答 2

Related

Reference