1

我在 Amazon EMR 集群中运行一个 spark 应用程序,从几天前开始,每当我尝试使用 pandas 从 S3 读取文件时都会收到以下错误。我添加了引导操作来安装 pandas、fsspec 和 s3fs。

代码:

import pandas as pd
df = pd.read_csv(s3_path)

错误日志:

Traceback (most recent call last):
  File "spark.py", line 84, in <module>
    df=pd.read_csv('s3://<bucketname>/<filename>.csv')
  File "/usr/local/lib64/python3.7/site-packages/pandas/io/parsers.py", line 686, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib64/python3.7/site-packages/pandas/io/parsers.py", line 435, in _read
    filepath_or_buffer, encoding, compression
  File "/usr/local/lib64/python3.7/site-packages/pandas/io/common.py", line 222, in get_filepath_or_buffer
    filepath_or_buffer, mode=mode or "rb", **(storage_options or {})
  File "/usr/local/lib/python3.7/site-packages/fsspec/core.py", line 133, in open
    out = self.__enter__()
  File "/usr/local/lib/python3.7/site-packages/fsspec/core.py", line 101, in __enter__
    f = self.fs.open(self.path, mode=mode)
  File "/usr/local/lib/python3.7/site-packages/fsspec/spec.py", line 844, in open
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py", line 394, in _open
    autocommit=autocommit, requester_pays=requester_pays)
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py", line 1276, in __init__
    cache_type=cache_type)
  File "/usr/local/lib/python3.7/site-packages/fsspec/spec.py", line 1134, in __init__
    self.details = fs.info(path)
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py", line 719, in info
    return sync(self.loop, self._info, path, bucket, key, kwargs, version_id)
  File "/usr/local/lib/python3.7/site-packages/fsspec/asyn.py", line 51, in sync
    raise exc.with_traceback(tb)
  File "/usr/local/lib/python3.7/site-packages/fsspec/asyn.py", line 35, in f
    result[0] = await future
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py", line 660, in _info
    Key=key, **version_id_kw(version_id), **self.req_kw)
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py", line 214, in _call_s3
    raise translate_boto_error(err)
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py", line 207, in _call_s3
    return await method(**additional_kwargs)
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/client.py", line 121, in _make_api_call
    operation_model, request_dict, request_context)
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/client.py", line 140, in _make_request
    return await self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/endpoint.py", line 90, in _send_request
    exception):
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/endpoint.py", line 199, in _needs_retry
    caught_exception=caught_exception, request_dict=request_dict)
  File "/usr/local/lib/python3.7/site-packages/aiobotocore/hooks.py", line 29, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/utils.py", line 1225, in redirect_from_error
    new_region = self.get_bucket_region(bucket, response)
  File "/usr/local/lib/python3.7/site-packages/botocore/utils.py", line 1283, in get_bucket_region
    headers = response['ResponseMetadata']['HTTPHeaders']
TypeError: 'coroutine' object is not subscriptable
sys:1: RuntimeWarning: coroutine 'AioBaseClient._make_api_call' was never awaited

s3fs是否存在问题,因为这和 pandas 似乎是唯一收到更新的软件包,但我在 pandas 的变更日志中找不到与此相关的任何内容?

4

1 回答 1

3

Dask/s3fs团队已经承认这是一个错误。此Github 问题表明aiobotocore无法获取 S3 存储桶的region_name

如果您遇到同样的问题,请考虑将 s3fs 降级到0.4.2或尝试将环境变量设置AWS_DEFAULT_REGION为解决方法。

编辑:它已在最新版本的aiobotocore=1.1.1. 如果您遇到同样的问题,请升级您的 aiobotocore 和 s3fs。

于 2020-08-31T16:57:17.573 回答