您的文件中似乎有格式错误的 JSON 数据。例如,尝试加载以下“JSON”数据 - 请注意 id 77 格式错误。
{"created_at": "2019-01-01 23:45:01", "id":1}
{"created_at": "2019-01-01 23:45:01", "id":2}
{"created_at": "2019-01-01 23:45:01", "id":3}
{"created_at": "2019-01-01 23:45:01", "id":4}
{"created_at": "2019-01-01 23:45:01", "id":5}
{"created_at": "2019-01-01 23:45:01", "id":6}
{"created_at": "2019-01-01 23:45:01", "id":7}
{"created_at": "2019-01-01 23:45:01", "id":8}
{"created_at": "2019-01-01 23:45:01", "id":11}
{"created_at": "2019-01-01 23:45:01", "id":22}
{"created_at": "2019-01-01 23:45:01", "id":33}
{"created_at": "2019-01-01 23:45:01", "id":44}
{"created_at": "2019-01-01 23:45:01", "id":55}
{"created_at": "2019-01-01 23:45:01", "id":66}
{i"created_at": "2019-01-01 23:45:01", "id":77}
{"created_at": "2019-01-01 23:45:01", "id":88}
{"created_at": "2019-01-01 23:45:01", "id":99}
然后运行这段代码。
>>> import pandas as pd
>>> reader = pd.read_json("January.jsonl", lines=True, chunksize=1)
>>> for r in reader:
... print(r)
并查看输出:
12 2019-01-01 23:45:01 55
created_at id
13 2019-01-01 23:45:01 66
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 779, in __next__
obj = self._get_object_parser(lines_json)
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 753, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 857, in parse
self._parse_no_numpy()
File "/home/user/anaconda3/envs/project/lib/python3.7/site-packages/pandas/io/json/_json.py", line 1089, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
该错误与您收到的错误相同。您将需要找到格式错误的数据并进行修复。您可以尝试逐行读取 JSON 数据以找出错误存在的位置并提取这些行以检查它们。
f = open("January.jsonl")
lines=f.readlines()
for line_no, line in enumerate(lines):
try:
data = json.loads(line)
except Exception:
print(line_no)
print(line)