我正在尝试读取存储在 S3 上的一堆 JSON 文件,但list index out of range
在计算 DataFrame 时会引发
我打开 JSON 文件的调用是这样的:
pets_data = dd.read_json("s3://my-bucket/pets/*.json", meta=meta, blocksize=None, orient="records", lines=False)
并且在我调用时失败to_csv
(到 S3 或本地,两者都失败)
# save on local fails
pets_data.to_csv(
"pets-full-data.csv",
single_file=True,
index=False
)
# save on S3 fails as well
pets_data.to_csv(
"s3://my-bucket/pets-full-data.csv",
single_file=True,
index=False
)
堆栈跟踪:
File "main.py", line 89, in <module>
pets_data.to_csv(
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1423, in to_csv
return to_csv(self, filename, **kwargs)
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/io/csv.py", line 808, in to_csv
value = to_csv_chunk(dfs[0], first_file, **kwargs)
IndexError: list index out of range
注意:这仅在我尝试从 S3 打开文件时发生,当我从本地存储打开文件时一切顺利