2

我需要处理存储桶中特定文件夹中某个流的一些数据S3。我想在Python. 搜索了一段时间后,我找到了PyAthena正是我要找的图书馆!

我安装1.8.0PyAthena.

供您参考,我的S3存储桶位于 地区,Paris eu-west-3我的Athena数据库位于 地区Francfort eu-central-1

我使用了在文档PyAthena Doc中找到的以下代码:

from pyathena import connect

cursor = connect(aws_access_key_id='YOUR_ACCESS_KEY_ID',
             aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
             s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
             region_name='us-west-2').cursor()
cursor.execute("SELECT * FROM one_row")
print(cursor.description)
print(cursor.fetchall())

一开始我不确定region_name要使用哪个,如果它应该是存储桶Paris所在的位置,还是数据库所在的位置!!S3FrancfortAthena

我尝试了这两种方法并按照我收到的错误消息,我最终使用了我的一个S3桶!但是,我不断收到有关权限的错误Glue,例如:

pyathena.error.OperationalError: Insufficient permissions to execute the query.  Error retrieving table : master in database : default due to : User: arn:aws:iam::<my-account-client-ID>:user/s3-test is not authorized to perform: glue:GetTable on resource: arn:aws:glue:eu-west-3:<my-account-client-ID>:catalog

所以我在中添加了以下策略IAM

        {
        "Sid": "VisualEditor2",
        "Effect": "Allow",
        "Action": [
            "athena:StartQueryExecution",
            "athena:GetQueryResultsStream",
            "athena:GetQueryResults",
            "athena:DeleteNamedQuery",
            "athena:GetNamedQuery",
            "athena:*",
            "athena:ListQueryExecutions",
            "athena:ListNamedQueries",
            "athena:CreateNamedQuery",
            "athena:StopQueryExecution",
            "athena:GetQueryExecution",
            "athena:BatchGetNamedQuery",
            "athena:BatchGetQueryExecution"
        ],
        "Resource": "*"
    },
    {
        "Sid": "VisualEditor3",
        "Effect": "Allow",
        "Action": [
            "glue:GetTable",
            "glue:GetTables",
            "glue:GetDatabase"
        ],
        "Resource": [
            "arn:aws:glue:eu-west-3:<my-account-client-ID>:catalog",
            "arn:aws:glue:eu-west-3:<my-account-client-ID>:database/*",
            "arn:aws:glue:eu-west-3:<my-account-client-ID>:table/*/*"
        ]
    }

现在我有这个错误信息:

    cursor.execute("select * from master")
    File "/home/ubuntu/.local/lib/python3.6/site-packages/pyathena/util.py", line 28, in _wrapper
return wrapped(*args, **kwargs)
    File "/home/ubuntu/.local/lib/python3.6/site-packages/pyathena/cursor.py", line 57, in execute
raise OperationalError(query_execution.state_change_reason)
    pyathena.error.OperationalError: SYNTAX_ERROR: line 1:15: Schema default does not exist
4

1 回答 1

3

问题是 select 语句:如果你没有指明它,你将使用数据库默认值,如果你的环境中没有这样的数据库,它将失败。您应该指出您的数据库和表:

cursor.execute("SELECT * FROM <YOUR_DATABASE>.<YOUR_TABLE>")

或者您也可以在游标函数中使用参数指定数据库名称(或模式名称):

cursor = connect(aws_access_key_id='YOUR_ACCESS_KEY_ID',
             aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
             s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
             region_name='us-west-2').cursor(schema_name=<YOUR_DATABASE>)
cursor.execute("SELECT * FROM <YOUR_TABLE>")

如果您执行其中一项操作,则不应再出现相同的错误。

于 2019-12-03T13:45:48.560 回答