0

我正在尝试读取存储在 S3 上的一堆 JSON 文件,但list index out of range在计算 DataFrame 时会引发

我打开 JSON 文件的调用是这样的:

pets_data = dd.read_json("s3://my-bucket/pets/*.json", meta=meta, blocksize=None, orient="records", lines=False)

并且在我调用时失败to_csv(到 S3 或本地,两者都失败)

# save on local fails
pets_data.to_csv(
        "pets-full-data.csv",
        single_file=True,
        index=False
    )
# save on S3 fails as well
pets_data.to_csv(
        "s3://my-bucket/pets-full-data.csv",
        single_file=True,
        index=False
    )

堆栈跟踪:

File "main.py", line 89, in <module>
pets_data.to_csv(
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1423, in to_csv
return to_csv(self, filename, **kwargs)
File "/usr/local/lib/python3.8/site-packages/dask/dataframe/io/csv.py", line 808, in to_csv
value = to_csv_chunk(dfs[0], first_file, **kwargs)
IndexError: list index out of range

注意:这仅在我尝试从 S3 打开文件时发生,当我从本地存储打开文件时一切顺利

4

0 回答 0