我正在尝试使用 Dask 将多个文件 (JSON) 从 AWS S3 获取到 Sagemaker Jupyter Notebook 的内存中。当我提交 10 或 20 名工人时,一切顺利。但是,当我提交 100 个工作人员时,其中 30% 到 50% 的工作人员遇到以下错误:“无法找到凭据”
最初我尝试使用 Boto3。为了尝试消除此问题,我切换到 S3FS,但发生了同样的错误。
如果我重复实验,出现 NoCredentialError 错误的工作人员是随机的,失败下载的确切数量也是如此。
Sagemaker 通过其 IAM 角色处理所有 AWS 凭证,因此我无法访问密钥对或任何东西。~/.aws/config 文件仅包含默认位置 - 与凭据无关。
似乎这是 Dask 的一个非常常见的用途,所以它显然能够执行这样的任务——我哪里出错了?
任何帮助将非常感激!下面的代码和回溯。在此示例中,有 29 名工作人员因凭据而失败。谢谢,帕特里克
import boto3
import json
import logging
import multiprocessing
from dask.distributed import Client, LocalCluster
import s3fs
import os
THREADS_PER_DASK_WORKER = 4
CPU_COUNT = multiprocessing.cpu_count()
HTTP_SUCCESSFUL_REQUEST_CODE = 200
S3_BUCKET_NAME = '-redacted-'
keys_100 = ['-redacted-']
keys_10 = ['-redacted-']
def dispatch_workers(workers):
cluster_workers = min(len(workers), CPU_COUNT)
cluster = LocalCluster(n_workers=cluster_workers, processes=True,
threads_per_worker=THREADS_PER_DASK_WORKER)
client = Client(cluster)
data = []
data_futures = []
for worker in workers:
data_futures.append(client.submit(worker))
for future in data_futures:
try:
tmp_flight_data = future.result()
if future.status == 'finished':
data.append(tmp_flight_data)
else:
logging.error(f"Future status = {future.status}")
except Exception as err:
logging.error(err)
del data_futures
cluster.close()
client.close()
return data
def _get_object_from_bucket(key):
s3 = s3fs.S3FileSystem(anon=False)# uses default credentials
with s3.open(os.path.join(S3_BUCKET_NAME,key)) as f:
return json.loads(f.read())
def get_data(keys):
objects = dispatch_workers(
[lambda key=key: _get_object_from_bucket(key) for key in keys]
)
return objects
data = get_data(keys_100)
输出:
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials
ERROR:root:Unable to locate credentials