我必须在不同的环境中运行一些测试。在测试中,我必须检查 s3 中的一些目录以查找镶木地板文件并将它们传输到字典中,如下所示
import pyarrow.parquet as pq
import s3fs
env = 'dev'
aws_profile ={'dev': 'dev_aws_profile', 'qa': 'qa_aws_profile'}
def get_dictionary_from_parquet(file_name):
fs = s3fs.S3FileSystem()
pq_session = Session(profile_name=aws_profile[env])
s3 = pq_session.resource('s3')
parquet_bucket = s3.Bucket(f'valid-bucket-name-{env}')
paths = []
for pq_file in parquet_bucket.objects.filter(Prefix=f'valid-prefix-{env}'):
if pq_file.key.endswith(file_name):
paths.append(f's3://{pq_file.bucket_name}/{pq_file.key}')
data_set = pq.ParquetDataset(paths, filesystem=fs)
tbl = data_set.read()
pq_dictionary = tbl.to_pydict()
return pq_dictionary
如果aws_profile == aws 凭证文件中的默认配置文件,它会完美运行,但它会返回
line 14, in get_dictionary_from_parquet
data_set = pq.ParquetDataset(paths, filesystem=fs)
File "/Library/Python/3.7/site-packages/pyarrow/parquet.py", line 1170, in __init__
open_file_func=partial(_open_dataset_file, self._metadata)
File "/Library/Python/3.7/site-packages/pyarrow/parquet.py", line 1365, in _make_manifest
.format(path))
OSError: Passed non-file path: s3://<valid path to parquet file>
如何将 aws 配置文件凭据解析为 pyarrow 以修复它?