3

I have three cockroachdb nodes, two of which are with DigitalOcean (one in SF and NY) and the third TX server. I followed the Manual Deployment documentation and our local node initated and then our remote nodes came back with:

*
* WARNING: The server appears to be unable to contact the other nodes in the cluster. Please try
*
* - starting the other nodes, if you haven't already
* - double-checking that the '--join' and '--host' flags are set up correctly
* - not using the '--background' flag.
*
* If problems persist, please see https://www.cockroachlabs.com/docs/v1.0/cluster-setup-troubleshooting.html.
*

I nmaped from our NY server to our TX node and the port was open. I then ran the cockroach start with --logtostderr and noticed it's trying to resolve to the local IP, even when I tell it to --join REMOTEIP:PORT.

I171019 14:17:10.234575 12 cli/start.go:503  starting cockroach node
I171019 14:17:10.237272 12 storage/engine/rocksdb.go:411  opening rocksdb instance at "/root/cockroach-data/local"
W171019 14:17:10.251456 12 gossip/gossip.go:1241  [n?] no incoming or outgoing connections
I171019 14:17:10.251638 12 storage/engine/rocksdb.go:411  opening rocksdb instance at "/root/cockroach-data"
I171019 14:17:10.258098 12 server/config.go:528  [n?] 1 storage engine initialized
I171019 14:17:10.258271 12 server/config.go:530  [n?] RocksDB cache size: 500 MiB
I171019 14:17:10.258347 12 server/config.go:530  [n?] store 0: RocksDB, max size 0 B, max open file limit 10000
I171019 14:17:10.259025 12 server/server.go:837  [n?] no stores bootstrapped and --join flag specified, awaiting init command.
I171019 14:17:10.401973 21 gossip/client.go:129  [n?] started gossip client to 24.153.192.101:26257
I171019 14:17:10.454957 12 storage/stores.go:303  [n?] read 0 node addresses from persistent storage
I171019 14:17:10.455140 12 storage/stores.go:322  [n?] wrote 1 node addresses to persistent storage
I171019 14:17:10.455209 12 server/node.go:606  [n?] connecting to gossip network to verify cluster ID...
I171019 14:17:10.455268 12 server/node.go:631  [n?] node connected via gossip and verified as part of cluster "270f9533-45ef-4ff6-850d-da3160e9b5a6"
I171019 14:17:30.456253 70 vendor/google.golang.org/grpc/grpclog/grpclog.go:75  grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 10.10.11.12:26257: i/o timeout"; Reconnecting to {10.10.11.12:26257 <nil>}
W171019 14:17:40.454698 79 server/server.go:878  The server appears to be unable to contact the other nodes in the cluster. Please try

Did I setup the local node hostname incorrectly? The troubleshooting documentation is not super helpful. I even tried changing the TX host to the local IP and that did not resolve the issue.


EDIT:

Our firewall was causing communication issues. Once resolved, our TX node required the --advertise-host parameter.

4

1 回答 1

2

By default, cockroach nodes advertise their own addresses as the value of --host. In a private network, this will work just fine as the addresses will usually be resolvable/reachable by all nodes on the network.

However, when nodes are in separate networks, you may need to tell each node its public IP address using --advertise-host.

You can find more details about that in the cluster troubleshooting docs: https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#networking-troubleshooting

于 2017-10-19T14:51:03.790 回答