4

我有一个 Postgres BDR 集群,它有 3 个节点“ Ready ”和 3 个节点“ Parted ”。

如果我这样做SELECT * FROM bdr.bdr_nodes,将显示以下信息:

-[ RECORD 1 ]------+-------------------------
node_sysid         | 6153716379158074503
node_timeline      | 1
node_dboid         | 16385
node_status        | r
node_name          | node3
node_local_dsn     | host=x.x.x.241 [...]
node_init_from_dsn | host=x.x.x.47 [...]
-[ RECORD 2 ]------+-------------------------
node_sysid         | 6153716914784688297
node_timeline      | 1
node_dboid         | 16385
node_status        | r
node_name          | node2
node_local_dsn     | host=x.x.x.5 [...]
node_init_from_dsn | host=x.x.x.47 [...]
-[ RECORD 3 ]------+-------------------------
node_sysid         | 6170758438846557459
node_timeline      | 1
node_dboid         | 16384
node_status        | r
node_name          | node4
node_local_dsn     | host=x.x.x.128 [...]
node_init_from_dsn | host=x.x.x.47 [...]
-[ RECORD 4 ]------+-------------------------
node_sysid         | 6153716402564903569
node_timeline      | 1
node_dboid         | 16385
node_status        | k
node_name          | node1
node_local_dsn     | host=x.x.x.47 [...]
node_init_from_dsn | 
-[ RECORD 5 ]------+-------------------------
node_sysid         | 6170830020100809103
node_timeline      | 1
node_dboid         | 16385
node_status        | k
node_name          | node6
node_local_dsn     | host=x.x.x.48 [...]
node_init_from_dsn | host=x.x.x.241 [...]
-[ RECORD 6 ]------+-------------------------
node_sysid         | 6170839982079996801
node_timeline      | 1
node_dboid         | 16385
node_status        | c
node_name          | node8
node_local_dsn     | host=x.x.x.142 [...]
node_init_from_dsn | host=x.x.x.241 [...]
-[ RECORD 7 ]------+-------------------------
node_sysid         | 6170833985333433816
node_timeline      | 1
node_dboid         | 16385
node_status        | k
node_name          | node7
node_local_dsn     | host=x.x.x.48 [...]
node_init_from_dsn | host=x.x.x.241 [...]

我正在尝试加入node8。但它不会发生。错误如下:

d= p=5521 a=ERROR:  08006: could not connect to the primary server: could not connect to server: Connection timed out
        Is the server running on host "x.x.x.48" and accepting
        TCP/IP connections on port 5432?
d= p=5521 a=DETAIL:  Connection string is 'host=x.x.x.48 [...]'

该错误意味着它正在尝试连接到已被杀死或删除的节点。为什么要尝试连接到被杀死或删除的节点?我该如何解决这种情况?

以下命令用于加入node8

SELECT bdr.bdr_group_join(
      local_node_name := 'node8',
      node_external_dsn := 'host=x.x.x.142 [...]',
      join_using_dsn := 'host=x.x.x.241 [...]'
);

BDR 已根据此说明安装(Debian Wheezy):

curl -sSL https://manageacloud.com/api/cm/configuration/postgresql-bdr/debian/manageacloud-production-script.sh | bash

bdr.bdr_connections

-[ RECORD 1 ]----------+---------------------
conn_sysid             | 6170839982079996801
conn_timeline          | 1
conn_dboid             | 16385
conn_origin_sysid      | 0
conn_origin_timeline   | 0
conn_origin_dboid      | 0
conn_is_unidirectional | f
conn_dsn               | host=x.x.x.142 [...]
conn_apply_delay       | 
conn_replication_sets  | {default}
-[ RECORD 2 ]----------+----------------------
conn_sysid             | 6153716402564903569
conn_timeline          | 1
conn_dboid             | 16385
conn_origin_sysid      | 0
conn_origin_timeline   | 0
conn_origin_dboid      | 0
conn_is_unidirectional | f
conn_dsn               | host=x.x.x.47 [...]
conn_apply_delay       | 
conn_replication_sets  | {default}
-[ RECORD 3 ]----------+-----------------------
conn_sysid             | 6153716379158074503
conn_timeline          | 1
conn_dboid             | 16385
conn_origin_sysid      | 0
conn_origin_timeline   | 0
conn_origin_dboid      | 0
conn_is_unidirectional | f
conn_dsn               | host=x.x.x.241 [...]
conn_apply_delay       | 
conn_replication_sets  | {default}
-[ RECORD 4 ]----------+-----------------------
conn_sysid             | 6153716914784688297
conn_timeline          | 1
conn_dboid             | 16385
conn_origin_sysid      | 0
conn_origin_timeline   | 0
conn_origin_dboid      | 0
conn_is_unidirectional | f
conn_dsn               | host=x.x.x.5 [...]
conn_apply_delay       | 
conn_replication_sets  | {default}
-[ RECORD 5 ]----------+-----------------------
conn_sysid             | 6170758438846557459
conn_timeline          | 1
conn_dboid             | 16384
conn_origin_sysid      | 0
conn_origin_timeline   | 0
conn_origin_dboid      | 0
conn_is_unidirectional | f
conn_dsn               | host=x.x.x.128 [...]
conn_apply_delay       | 
conn_replication_sets  | {default}

版本:

# SELECT bdr.bdr_version();
    bdr_version    
-------------------
 0.9.1-2015-05-26-
(1 row)
4

1 回答 1

2

这是 BDR 中的一个错误。我刚刚在树的本地副本中修复了它,一旦我在本地测试它bdr-plugin/next,就会将更改推送到0.9.3 中。bdr-plugin/REL0_9_STABLE

问题在于,作为节点连接的一部分,我们没有bdr.bdr_connections根据在对等方上创建槽的过程中过滤掉行。bdr.bdr_nodes.state

删除bdr.bdr_connections没有相应bdr.bdr_nodes条目的任何条目是安全的,或者bdr.bdr_nodes条目必须state = 'k'在 0.9.2 及更早版本中解决此问题。

于 2015-07-13T06:29:08.297 回答