7

我在 AWS EC2 实例上运行的应用服务器中使用 Ruby 1.9.3。我的 Postgres 数据库在单独的 EC2 实例上运行,但两个实例都在同一个安全组中。当 m Ruby 代码连接到数据库时,它使用 Sequel ORM gem ( http://sequel.rubyforge.org/ )。

现在,我已将 Postgres 9.1.4 DB 配置为能够正确接受来自应用服务器实例的连接。

但是,时不时地,我在应用服务器的日志中注意到它无法连接到 Postgres 数据库实例,并且我会看到如下错误消息:

PG::Error: could not receive data from server: Connection timed out

或者

PG::Error: connection not open

所以我去了 Postgres 数据库实例并查看了 /var/log/postgresql/postgresql-9.1-main.log ,我看到了一堆这样的消息:

2012-11-07 08:15:17 UTC LOG:  could not receive data from client: Connection timed out
2012-11-07 08:15:17 UTC LOG:  unexpected EOF on client connection

我在网上搜索过,包括堆栈溢出,并确保我的 PostgreSQL 没有启用 SSL(我的 postgresql.conf 文件中有“ssl=off”)

在这一点上,我不确定 Postgres 配置中的问题到底是什么。如果没有充分证明的理由,我不会在我的生产服务器上搞乱最大连接数或最大超时值。

应用服务器大部分时间都可以连接到数据库,这个问题只是间歇性地出现。

在 Ruby 方面,这是进行 Postgres 调用时“连接未打开”的错误跟踪:

PG::Error: connection not open
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:145:in `async_exec'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:145:in `block in execute_query'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/logging.rb:33:in `log_yield'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:145:in `execute_query'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:132:in `block in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:111:in `check_disconnect_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:132:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:372:in `_execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block (2 levels) in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:379:in `check_database_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `block in synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/connection_pool/threaded.rb:105:in `hold'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/dataset/actions.rb:744:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:483:in `fetch_rows'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:785:in `primary_key_lookup'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:124:in `[]'

同样,这是“无法从服务器接收数据”的跟踪:

    PG::Error: could not receive data from server: Connection timed out
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:124:in `block'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:124:in `ensure in check_disconnect_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:124:in `check_disconnect_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:132:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:372:in `_execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block (2 levels) in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:379:in `check_database_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `block in synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/connection_pool/threaded.rb:105:in `hold'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/dataset/actions.rb:744:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:483:in `fetch_rows'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:785:in `primary_key_lookup'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:124:in `[]'

我观察到,如果我在同一个实例上同时运行 App 服务器和 Postgres DB,那么就不存在连接问题,至少现在还没有。也许 Postgres 对非本地数据库连接不太宽容?

请让我知道我可能错过了什么,我很感激!

4

1 回答 1

2

通常对此的解释是连接问题。

或者,如果不是连接,则可能是协议同步问题。看起来两端可能正在尝试从套接字读取,而没有尝试写入。所以也许客户端期望服务器发送响应,而服务器期望客户端发送数据。

如果它是间歇性的和偶尔的,这将很难调试,因为你不能真的只是 tcpdump 并分析它。

我会在服务器端添加更多日志记录 -log_statement = 'all'和一个log_line_prefix显示客户端 IP、后端开始时间和后端 pid 的日志。这样,您就可以开始尝试将这些失败与失败之前发生的会话活动相匹配,确定它是特定的客户端、特定的作业还是真的只是随机的。

这个 Sequel ORM gemlibpq是在底层使用,还是它自己的 PostgreSQL 协议实现?如果是后者,那可能会成为罪魁祸首。

更新:看起来它可以使用pggem(libpq-based)、postgresgem 或可能postgres-pr(无论是什么)。pg如果安装了它会更喜欢。

由于您似乎已经在使用pggem,您可能需要做一些诊断工作来追踪问题出现的位置 - 特定查询、特定客户端等 - 并尝试找到重现问题的方法。PostgreSQLcsvlog可能很有用,因此您可以更轻松地加载和分析日志。

于 2012-11-08T02:43:39.757 回答