2

I have a pretty simple AWS Lambda function in which I connect to an Amazon Keyspaces for Cassandra database. This code in Python works, but from time to time I get the error. How do I fix this strange behavior? I have an assumption that you need to make additional settings when initializing the cluster. For example, set_max_connections_per_host. I would appreciate any help.

ERROR:

('Unable to complete the operation against any hosts', {<Host: X.XXX.XX.XXX:XXXX eu-central-1>: ConnectionShutdown('Connection to X.XXX.XX.XXX:XXXX was closed')})

lambda_function.py:

import sessions


cassandra_db_session = None
cassandra_db_username = 'your-username'
cassandra_db_password = 'your-password'
cassandra_db_endpoints = ['your-endpoint']
cassandra_db_port = 9142


def lambda_handler(event, context):
    global cassandra_db_session
    if not cassandra_db_session:
        cassandra_db_session = sessions.create_cassandra_session(
            cassandra_db_username,
            cassandra_db_password,
            cassandra_db_endpoints,
            cassandra_db_port
        )
    result = cassandra_db_session.execute('select * from "your-keyspace"."your-table";')
    return 'ok'

sessions.py:

from ssl import SSLContext
from ssl import CERT_REQUIRED
from ssl import PROTOCOL_TLSv1_2
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.policies import DCAwareRoundRobinPolicy


def create_cassandra_session(db_username, db_password, db_endpoints, db_port):
    ssl_context = SSLContext(PROTOCOL_TLSv1_2)
    ssl_context.load_verify_locations('your-path/AmazonRootCA1.pem')
    ssl_context.verify_mode = CERT_REQUIRED
    auth_provider = PlainTextAuthProvider(username=db_username, password=db_password)
    cluster = Cluster(
        db_endpoints,
        ssl_context=ssl_context,
        auth_provider=auth_provider,
        port=db_port,
        load_balancing_policy=DCAwareRoundRobinPolicy(local_dc='eu-central-1'),
        protocol_version=4,
        connect_timeout=60
    )
    session = cluster.connect()
    return session
4

2 回答 2

2

在客户端设置最大连接没有什么意义,因为 AWS Lambda 在运行之间实际上是“死的”。出于同样的原因,建议禁用驱动程序心跳(使用idle_heartbeat_interval = 0),因为在下次调用该函数之前不会发生任何活动。

这不一定会导致您看到的问题,但连接很有可能在服务器端关闭后被驱动程序重用。

由于缺乏关于 AWS Keyspaces 内部运作的公共文档,因此很难知道集群上发生了什么。我一直怀疑 AWS Keyspaces 在 Dynamo DB 前面有一个类似 CQL 的 API 引擎,所以像你所看到的那样的怪癖很难追踪,因为它需要的知识只能在 AWS 内部获得。

FWIW DataStax 驱动程序未针对 AWS Keyspaces 进行测试。

于 2020-10-15T03:59:06.723 回答
1

这是我看到的最大问题:

result = cassandra_db_session.execute('select * from "your-keyspace"."your-table";')

代码看起来不错,但我没有看到WHERE子句。因此,如果有大量数据,单个节点(选择作为协调器)将必须在从所有其他节点提取数据的同时构建结果集。由于这会导致(不可)预测的糟糕性能,这可以解释为什么它有时会起作用,但有时却不起作用。

专业提示:Cassandra 中的所有查询都应该有一个WHERE子句。

于 2020-10-14T17:43:34.283 回答