1

我正在尝试使用 AWS rds-data api 在极光无服务器 Postgresql 中执行一些繁重的 etl。

根据AWS Rds DATA文档,By default, a call times out if it's not finished processing within 45 seconds. However, you can continue running a SQL statement if the call times out by using the continueAfterTimeout parameter.

我看到boto3 rds-data支持 continueAfterTimeout 参数(布尔值)。我可以在这样的交易中使用这个标志。

def execute_transaction_query(sql,  transaction_id):
        print(sql)
        response = rds_client.execute_statement(
            secretArn=AURORA_DB_CREDS_ARN,
            resourceArn=AURORA_DB_ARN,
            database=AURORA_DB_NAME,
            sql=sql,
            transactionId=transaction_id,
            continueAfterTimeout=True, # boolean flag to continue after timeout in theory
        )

但是,查询在 45 秒后仍然失败并出现错误

An error occurred (StatementTimeoutException) when calling the ExecuteStatement operation: Request timed out
4

1 回答 1

0

好的,所以 rds-data 调用失败的原因是这continueAfterTimeout=True并不意味着 boto3 调用不会失败,只是在数据库上运行的 sql 查询将继续运行

所以在运行 rds-data etls 时需要做的是在 try/catch 块中执行语句:

            response = rds_client.execute_statement(
                secretArn=AURORA_DB_CREDS_ARN,
                resourceArn=AURORA_DB_ARN,
                database=AURORA_DB_NAME,
                sql=sql,
                transactionId=transaction_id,
                continueAfterTimeout=True,
            )
        except botocore.exceptions.ClientError as error:
            # aurora fails automatically after 45 seconds but continues in the db
            #https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html
            if error.response['Error']['Code'] == 'StatementTimeoutException':
                print('QUERY TIMEDOUT AFTER MAX 45 SECONDS. THIS IS FINE')
                # arbitrary wait in case the commit transaction fails with timeout
                time.sleep(60)
            else:
                raise Exception(error)
于 2020-12-14T11:49:29.037 回答