0

AWS Wrangler 提供了一个方便的接口,用于将 S3 对象用作 pandas 数据帧。我想在获取对象时使用它而不是 boto3 客户端、资源或会话。我还需要使用 SSL 验证。

以下 boto3 客户端代码适用于 SSL Aries 根证书 (!)

import awswrangler as wr
import boto3
import os

aries_cert = os.environ['ARIES_CERT']

s3_session = boto3.Session(
    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
    region_name="us-east-1"
)
s3_client = s3_session.client(
    service_name="s3",
    endpoint_url="https://MY-ENDPOINT.com",
    use_ssl=True,
    verify=aries_cert,
    aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
    aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
    config=botocore.config.Config(
        read_timeout=600,
        connect_timeout=600,
        retries={"max_attempts": 3}
    )
)

bucket, prefix = path.split('/', 1)
bucket = bucket if not bucket.startswith('s3://') else bucket.split('s3://')[1]
obj = s3_client.get_object(Bucket=bucket, Key=prefix)
# Do stuff with `obj['Body'].read()`

这个 aws wrangler 代码也可以工作(没有 TLS(SSL?)客户端证书):

import awswrangler as wr
import boto3
import botocore
import os

wr.config.s3_endpoint_url = "https://MY-ENDPOINT.com"

session = boto3.Session(
    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
    region_name="us-east-1"
)
path = f's3://{path}' if not path.startswith('s3://') else path

df = wr.s3.read_parquet(
    path=path,
    dataset=True,
    boto3_session=session
)

但是当我包含 TLS(SSL?)客户端证书时,读取失败:

wr.config.botocore_config = botocore.config.Config(
    retries={"max_attempts": 3},
    connect_timeout=600,
    read_timeout=600,
    client_cert=os.getenv("ARIES_CERT")
)
df = wr.s3.read_parquet(
    path=path,
    dataset=True,
    boto3_session=session
)

错误信息:

SSLError: SSL 验证失败https://MY-ENDPOINT.com/MY-BUCKET?list-type=2&prefix=MY-PREFIX-BLAH-BLAH.parquet%2F&max-keys=1000&encoding-type=url [SSL] PEM lib (_ssl.c:3524)

知道这里发生了什么吗?我没有找到 aws wrangler 文档,也没有发现 boto3 和 botocore 的文档非常有帮助:

https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/002%20-%20Sessions.html https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/021% 20-%20Global%20Configurations.html#21---全局配置 https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html https://botocore.amazonaws.com /v1/documentation/api/latest/reference/config.html https://botocore.amazonaws.com/v1/documentation/api/latest/tutorial/index.html

以前也有人问过这种问题,如果可以提供关于如何在不同上下文中使用 boto3 客户端、资源和会话的直觉,那将不胜感激。

4

0 回答 0