我有一个 1 节点 C* 2.0.4 集群正在运行,nodetool 状态显示一个健康的集群。
然后,我使用“sudo yum install opscenter-free”在同一网络上的另一台计算机上安装了 OpsCenter 4.0.3。
在 opscenterd.conf 文件中,我设置 interface = 'OpsCenter 服务器的公共 IP' 并启动 OpsCenter 服务器。
然后我可以看到 OpsCenter 网页并单击使用现有集群。
在 Add Cluster 界面下,我输入了 1 节点 Cassandra 集群的 rpc_address。OpsCenter 接受了它并在下一页上正确显示了集群名称。
但是,没有加载 OpsCenter 中的图表,我看到错误:0 of 0 个代理已连接。我还看到一个闪烁的红色 X,顶部有一个插头图标。
目前在 CentOS 中 OpsCenter 和 C* 节点上的防火墙均已关闭。
如何让 OpsCenter 正确连接到 C* 节点?
这是 OpsCenter 日志显示的内容(注意:我将 IP 替换为 ABCD):
2014-01-30 06:43:37+0000 [Dog] WARN: Unable to collect datacenter, rack information: Failed query to http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D : Connection was refused by other side: 111: Connection refused.
2014-01-30 06:45:37+0000 [Dog] WARN: HTTP request http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D failed: Connection was refused by other side: 111: Connection refused.
2014-01-30 06:45:37+0000 [Dog] WARN: Unable to collect datacenter, rack information: Failed query to http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D : Connection was refused by other side: 111: Connection refused.
在 Cassandra 节点上,一切看起来都很健康:
[root@cassandra01 ~]# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN A.B.C.D 158.27 KB 256 100.0% bc560cd6-a20d-4b36-99ca-ed477dc939b5 rack1
但是,我无法 curl OpsCenter 尝试访问的 URL:
[root@cassandra01 ~]# curl http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D
curl: (7) couldn't connect to host
在 OpsCenterd.conf 文件中关闭 SSL(默认设置)。
这是我在以下 URL 中看到的内容 http:// OpsCenter 的公共 IP :8888/Dog/nodes
[{"load": null, "has_jna": false, "vnodes": true, "devices": {"saved_caches": null, "commitlog": null, "other": null, "data": null}, "task_progress": {}, "node_ip": "A.B.C.D", "network_interfaces": null, "ec2": {}, "node_version": {}, "dc": null, "node_name": null, "num_procs": null, "streaming": {}, "token": "5743408169174478324", "data_held": null, "mode": "unknown", "rpc_ip": "10.183.132.141", "partitions": {"saved_caches": null, "commitlog": null, "other": null, "data": null}, "os": null, "rack": null, "last_seen": 0}]
关于如何解决这个问题的任何想法?
注意,在 Cassandra 的 YAML 文件中,rpc_server_type 设置为同步。
更新:
我还尝试使用“yum install datastax-agent”在 C* 节点上手动安装 OpsCenter 代理,然后使用以下设置编辑 address.yaml 文件:
stomp_interface: 'public ip of machine opscenterd is running on (public IP)'
local_interface: 'listen_address in cassandra.yaml (public IP)'
agent_rpc_interface: 'rpc address in cassandra.yaml (private IP network)'
agent_rpc_broadcast_address: 'private network IP, same network at rpc address'
我为 address.yaml 文件尝试了一些不同的设置,但都没有奏效。例如,我尝试只设置 stop_interface 并删除其他 3 行。没用。我还尝试设置为停止和本地接口,但这也不起作用。
当我现在使用“service datastax-agent start”启动 datastax 代理时,Cassandra 服务突然崩溃:
[root@cassandra01 ~]# sudo service cassandra status cassandra dead but pid file存在
当 C* 服务崩溃时,opscenter 代理保持正常运行。如果我停止代理服务并再次启动 C* 服务(sudo service cassandra status),则 C* 成功启动备份并且 nodetool 状态显示健康的 1 节点集群。但是我一启动代理服务,C*服务突然又崩溃了。我在 address.yaml 文件中尝试的所有不同设置都会导致相同的行为。
理想情况下,我宁愿不手动安装代理,而只想将其安装程序从 OpsCenter GUI 推送到 C* 节点,但由于这不起作用,我尝试手动安装代理并将其连接到 OpsCenter,但是不幸的是,这也不起作用。
有时,当 Cassandra 服务崩溃时,我也会在 Cassandra 节点上看到这一点:[root@cassandra01 ~]# sudo service cassandra stoplog4j:WARN No appenders could be found for logger (org.eclipse.jetty.util.log)。log4j:WARN 请正确初始化 log4j 系统。log4j:WARN 有关详细信息,请参阅http://logging.apache.org/log4j/1.2/faq.html#noconfig。用法:cassandra start|stop|status|restart|reload
这是 Cassandra 节点的 log4j-server.properties 包含的内容:
log4j.rootLogger=INFO,stdout,R
# stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p %d{HH:mm:ss,SSS} %m%n
# rolling log file
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.maxFileSize=20MB
log4j.appender.R.maxBackupIndex=50
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n
# Edit the next line to point to your logs directory
log4j.appender.R.File=/var/log/cassandra/system.log
# Application logging options
#log4j.logger.org.apache.cassandra=DEBUG
#log4j.logger.org.apache.cassandra.db=DEBUG
#log4j.logger.org.apache.cassandra.service.StorageProxy=DEBUG
# Adding this to avoid thrift logging disconnect errors.
log4j.logger.org.apache.thrift.server.TNonblockingServer=ERROR
最后,以下是在 Cassandra 节点上运行的 OpsCenter 代理的 agent.log 显示的内容:
nohup: ignoring input
Starting DataStax agent monitor datastax_agent_monitor
INFO [main] 2014-01-30 08:24:59,104 Loading conf files: /var/lib/datastax-agent /conf/address.yaml
INFO [main] 2014-01-30 08:24:59,261 Java vendor/version: Java HotSpot(TM) 64-Bi t Server VM/1.7.0_25
INFO [main] 2014-01-30 08:24:59,546 Default config values: {:rollups300_ttl 241 9200, :settings_cf "settings", :agent_rpc_interface "10.183.132.141", :my_channe l_prefix "/agent", :poll_period 60, :kerberos_hostname nil, :storage_dc nil, :th rift_conn_timeout 10000, :thrift_max_frame_size 15728640, :rollups60_ttl 604800, :stomp_port 61620, :shorttime_interval 10, :longtime_interval 300, :private-con f-props ["initial_token" "listen_address" "broadcast_address" "rpc_address"], :t hrift_port 9160, :async_retry_timeout 5, :agent-conf-group "global-cluster-agent -group", :jmx_host "127.0.0.1", :ec2_metadata_api_host "169.254.169.254", :metri cs_enabled 1, :async_queue_size 5000, :autodiscovery_interval 120, :rollups7200_ ttl 31536000, :autodiscovery_enabled true, :thrift_ssl_truststore nil, :rollup_s napshot_period 300, :is_package true, :monitor_command "/usr/share/datastax-agen t/bin/datastax_agent_monitor", :thrift_socket_timeout 5000, :cassandra_log_locat ion "/var/log/cassandra/system.log", :local_interface "23.253.64.169", :jmx_port 7199, :jmx_metrics_threadpool_size 4, :use_ssl 0, :rollups86400_ttl -1, :nodede tails_threadpool_size 3, :api_port 61621, :kerberos_service nil, :kerberos_clien t_principal nil, :jmx_thread_pool_size 5, :production 1, :stomp_interface "166.7 8.186.184", :storage_keyspace "OpsCenter", :rollup_snapshot_threshold 300, :thri ft_ssl_truststore_type "JKS", :realtime_interval 5}
INFO [main] 2014-01-30 08:24:59,554 Waiting for the config from OpsCenter
INFO [main] 2014-01-30 08:24:59,559 Using 23.253.64.169 as the cassandra broadc ast address
INFO [main] 2014-01-30 08:24:59,568 New JMX connection (127.0.0.1:7199)
ERROR [main] 2014-01-30 08:25:00,019 Error connecting via JMX: java.io.IOExcepti on: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 127.0. 0.1; nested exception is:
java.net.ConnectException: Connection refused]
INFO [main] 2014-01-30 08:25:00,414 cassandra RPC address is nil
INFO [main] 2014-01-30 08:25:00,418 agent RPC broadcast address is 10.183.132. 141
INFO [main] 2014-01-30 08:25:00,474 Clearing ssl.truststore
INFO [main] 2014-01-30 08:25:00,475 Clearing ssl.truststore.password
INFO [main] 2014-01-30 08:25:00,476 Setting ssl.store.type to JKS
INFO [main] 2014-01-30 08:25:00,477 Clearing kerberos.service.principal.name
INFO [main] 2014-01-30 08:25:00,480 Clearing kerberos.principal
INFO [main] 2014-01-30 08:25:00,480 Clearing kerberos.useTicketCache
INFO [main] 2014-01-30 08:25:00,481 Clearing kerberos.ticketCache
INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.useKeyTab
INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.keyTab
INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.renewTGT
INFO [main] 2014-01-30 08:25:00,488 Clearing kerberos.debug
INFO [main] 2014-01-30 08:25:00,495 Starting Stomp
INFO [main] 2014-01-30 08:25:00,495 SSL communication is disabled
INFO [main] 2014-01-30 08:25:00,495 Creating stomp connection to 166.78.186.184 :61620
INFO [thrift-init] 2014-01-30 08:25:00,521 Connecting to Cassandra cluster: 23. 253.64.169 (port 9160)
INFO [StompConnection receiver] 2014-01-30 08:25:00,536 Reconnecting in 0s.
INFO [StompConnection receiver] 2014-01-30 08:25:00,561 Connected to 166.78.186 .184:61620
INFO [thrift-init] 2014-01-30 08:25:00,619 Downed Host Retry service started wi th queue size -1 and retry delay 10s
INFO [thrift-init] 2014-01-30 08:25:00,662 Registering JMX me.prettyprint.cassa ndra.service_Agent Cluster:ServiceType=hector,MonitorType=hector
INFO [main] 2014-01-30 08:25:00,732 Starting Jetty server: {:port 61621, :host "10.183.132.141", :ssl? false, :join? false}
ERROR [thrift-init] 2014-01-30 08:25:00,885 MARK HOST AS DOWN TRIGGERED for host 23.253.64.169(23.253.64.169):9160
ERROR [thrift-init] 2014-01-30 08:25:00,886 Pool state on shutdown: <ConcurrentC assandraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}; IsActive?: true; Active: 0; Blocked: 1; Idle: 0; NumBeforeExhausted: 1
INFO [thrift-init] 2014-01-30 08:25:00,887 Shutdown triggered on <ConcurrentCas sandraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}
INFO [thrift-init] 2014-01-30 08:25:00,901 Shutdown complete on <ConcurrentCass andraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}
INFO [thrift-init] 2014-01-30 08:25:00,902 Host detected as down was added to r etry queue: 23.253.64.169(23.253.64.169):9160
WARN [thrift-init] 2014-01-30 08:25:00,914 Could not fullfill request on this h ost null
WARN [Hector.me.prettyprint.cassandra.connection.CassandraHostRetryService-1] 2 014-01-30 08:25:00,910 Downed 23.253.64.169(23.253.64.169):9160 host still appea rs to be down: Unable to open transport to 23.253.64.169(23.253.64.169):9160 , j ava.net.ConnectException: Connection refused
WARN [thrift-init] 2014-01-30 08:25:00,926 Exception:
me.prettyprint.hector.api.exceptions.HectorTransportException: Unable to open tr ansport to 23.253.64.169(23.253.64.169):9160 , java.net.ConnectException: Connec tion refused
at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:180)
at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:38)
at me.prettyprint.cassandra.connection.ConcurrentHClientPool.createClien t(ConcurrentHClientPool.java:162)
at me.prettyprint.cassandra.connection.ConcurrentHClientPool.borrowClien t(ConcurrentHClientPool.java:94)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFai lover(HConnectionManager.java:250)
at me.prettyprint.cassandra.service.AbstractCluster.describeClusterName( AbstractCluster.java:155)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
at clj_hector.core$cluster_name.invoke(core.clj:40)
at opsagent.cassandra$setup_cassandra$f__353__auto____900$fn__920.invoke (cassandra.clj:360)
at opsagent.cassandra$setup_cassandra$f__353__auto____900.invoke(cassand ra.clj:358)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectExce ption: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.ja va:81)
at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:174)
... 16 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
... 18 more
ERROR [thrift-init] 2014-01-30 08:25:00,965 Error when performing thrift operati on:
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down . Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromL BPolicy(HConnectionManager.java:395)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFai lover(HConnectionManager.java:249)
at me.prettyprint.cassandra.service.AbstractCluster.describeClusterName( AbstractCluster.java:155)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
at clj_hector.core$cluster_name.invoke(core.clj:40)
at opsagent.cassandra$setup_cassandra$f__353__auto____900$fn__920.invoke (cassandra.clj:360)
at opsagent.cassandra$setup_cassandra$f__353__auto____900.invoke(cassand ra.clj:358)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Unknown Source)
INFO [StompConnection receiver] 2014-01-30 08:25:01,024 Got new config from Ops Center: {:kerberos_use_keytab true, :rollups300_ttl 2419200, :kerberos_use_ticke t_cache true, :rollups60_ttl 604800, :thrift_port 9160, :ec2_metadata_api_host " 169.254.169.254", :metrics_enabled 1, :rollups7200_ttl 31536000, :thrift_ssl_tru ststore nil, :metrics_ignored_column_families "", :cassandra_log_location "/var/ log/cassandra/system.log", :thrift_rpc_interface "10.183.132.141", :thrift_ssl_t ruststore_password nil, :jmx_port 7199, :provisioning 0, :use_ssl 0, :kerberos_d ebug false, :rollups86400_ttl -1, :api_port "61621", :storage_keyspace "OpsCente r", :kerberos_renew_tgt true, :metrics_ignored_solr_cores "", :thrift_ssl_trusts tore_type "JKS", :metrics_ignored_keyspaces "system, system_traces, system_auth, dse_auth, OpsCenter", :rollup_subscriptions [], :cassandra_install_location ""}
INFO [StompConnection receiver] 2014-01-30 08:25:01,030 Starting up agent colle ction.
INFO [StompConnection receiver] 2014-01-30 08:25:01,040 New JMX connection (127 .0.0.1:7199)
ERROR [StompConnection receiver] 2014-01-30 08:25:01,073 Error connecting via JM X: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceU navailableException [Root exception is java.rmi.ConnectException: Connection ref used to host: 127.0.0.1; nested exception is:
java.net.ConnectException: Connection refused]
INFO [Jetty] 2014-01-30 08:25:01,160 Jetty server started
INFO [StompConnection receiver] 2014-01-30 08:25:01,188 Starting OS metric coll ectors (Linux)
INFO [StompConnection receiver] 2014-01-30 08:25:01,199 Starting Cassandra JMX metric collectors
INFO [install-location-finder] 2014-01-30 08:25:01,250 New JMX connection (127. 0.0.1:7199)
INFO [StompConnection receiver] 2014-01-30 08:25:01,252 New JMX connection (127 .0.0.1:7199)
ERROR [install-location-finder] 2014-01-30 08:25:01,261 Error connecting via JMX : java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUn availableException [Root exception is java.rmi.ConnectException: Connection refu sed to host: 127.0.0.1; nested exception is:
java.net.ConnectException: Connection refused]