2

我有一个 1 节点 C* 2.0.4 集群正在运行,nodetool 状态显示一个健康的集群。

然后,我使用“sudo yum install opscenter-free”在同一网络上的另一台计算机上安装了 OpsCenter 4.0.3。

在 opscenterd.conf 文件中,我设置 interface = 'OpsCenter 服务器的公共 IP' 并启动 OpsCenter 服务器。

然后我可以看到 OpsCenter 网页并单击使用现有集群。

在 Add Cluster 界面下,我输入了 1 节点 Cassandra 集群的 rpc_address。OpsCenter 接受了它并在下一页上正确显示了集群名称。

但是,没有加载 OpsCenter 中的图表,我看到错误:0 of 0 个代理已连接。我还看到一个闪烁的红色 X,顶部有一个插头图标。

目前在 CentOS 中 OpsCenter 和 C* 节点上的防火墙均已关闭。

如何让 OpsCenter 正确连接到 C* 节点?

这是 OpsCenter 日志显示的内容(注意:我将 IP 替换为 ABCD):

2014-01-30 06:43:37+0000 [Dog]  WARN: Unable to collect datacenter, rack information: Failed query to http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D : Connection was refused by other side: 111: Connection refused.
2014-01-30 06:45:37+0000 [Dog]  WARN: HTTP request http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D failed: Connection was refused by other side: 111: Connection refused.
2014-01-30 06:45:37+0000 [Dog]  WARN: Unable to collect datacenter, rack information: Failed query to http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D : Connection was refused by other side: 111: Connection refused.

在 Cassandra 节点上,一切看起来都很健康:

[root@cassandra01 ~]# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns   Host ID                               Rack
UN  A.B.C.D  158.27 KB  256     100.0%  bc560cd6-a20d-4b36-99ca-ed477dc939b5  rack1

但是,我无法 curl OpsCenter 尝试访问的 URL:

[root@cassandra01 ~]# curl http://A.B.C.D:61621/cluster/topology?node_ip=A.B.C.D
curl: (7) couldn't connect to host

在 OpsCenterd.conf 文件中关闭 SSL(默认设置)。

这是我在以下 URL 中看到的内容 http:// OpsCenter 的公共 IP :8888/Dog/nodes

[{"load": null, "has_jna": false, "vnodes": true, "devices": {"saved_caches": null, "commitlog": null, "other": null, "data": null}, "task_progress": {}, "node_ip": "A.B.C.D", "network_interfaces": null, "ec2": {}, "node_version": {}, "dc": null, "node_name": null, "num_procs": null, "streaming": {}, "token": "5743408169174478324", "data_held": null, "mode": "unknown", "rpc_ip": "10.183.132.141", "partitions": {"saved_caches": null, "commitlog": null, "other": null, "data": null}, "os": null, "rack": null, "last_seen": 0}]

关于如何解决这个问题的任何想法?

注意,在 Cassandra 的 YAML 文件中,rpc_server_type 设置为同步。


更新:

我还尝试使用“yum install datastax-agent”在 C* 节点上手动安装 OpsCenter 代理,然后使用以下设置编辑 address.yaml 文件:

stomp_interface: 'public ip of machine opscenterd is running on (public IP)'
local_interface: 'listen_address in cassandra.yaml (public IP)'
agent_rpc_interface: 'rpc address in cassandra.yaml (private IP network)'
agent_rpc_broadcast_address: 'private network IP, same network at rpc address'

我为 address.yaml 文件尝试了一些不同的设置,但都没有奏效。例如,我尝试只设置 stop_interface 并删除其他 3 行。没用。我还尝试设置为停止和本地接口,但这也不起作用。

当我现在使用“service datastax-agent start”启动 datastax 代理时,Cassandra 服务突然崩溃:

[root@cassandra01 ~]# sudo service cassandra status cassandra dead but pid file存在

当 C* 服务崩溃时,opscenter 代理保持正常运行。如果我停止代理服务并再次启动 C* 服务(sudo service cassandra status),则 C* 成功启动备份并且 nodetool 状态显示健康的 1 节点集群。但是我一启动代理服务,C*服务突然又崩溃了。我在 address.yaml 文件中尝试的所有不同设置都会导致相同的行为。

理想情况下,我宁愿不手动安装代理,而只想将其安装程序从 OpsCenter GUI 推送到 C* 节点,但由于这不起作用,我尝试手动安装代理并将其连接到 OpsCenter,但是不幸的是,这也不起作用。

有时,当 Cassandra 服务崩溃时,我也会在 Cassandra 节点上看到这一点:[root@cassandra01 ~]# sudo service cassandra stoplog4j:WARN No appenders could be found for logger (org.eclipse.jetty.util.log)。log4j:WARN 请正确初始化 log4j 系统。log4j:WARN 有关详细信息,请参阅http://logging.apache.org/log4j/1.2/faq.html#noconfig。用法:cassandra start|stop|status|restart|reload

这是 Cassandra 节点的 log4j-server.properties 包含的内容:

log4j.rootLogger=INFO,stdout,R

# stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p %d{HH:mm:ss,SSS} %m%n

# rolling log file
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.maxFileSize=20MB
log4j.appender.R.maxBackupIndex=50
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n
# Edit the next line to point to your logs directory
log4j.appender.R.File=/var/log/cassandra/system.log

# Application logging options
#log4j.logger.org.apache.cassandra=DEBUG
#log4j.logger.org.apache.cassandra.db=DEBUG
#log4j.logger.org.apache.cassandra.service.StorageProxy=DEBUG

# Adding this to avoid thrift logging disconnect errors.
log4j.logger.org.apache.thrift.server.TNonblockingServer=ERROR

最后,以下是在 Cassandra 节点上运行的 OpsCenter 代理的 agent.log 显示的内容:

nohup: ignoring input
Starting DataStax agent monitor datastax_agent_monitor
 INFO [main] 2014-01-30 08:24:59,104 Loading conf files: /var/lib/datastax-agent /conf/address.yaml
 INFO [main] 2014-01-30 08:24:59,261 Java vendor/version: Java HotSpot(TM) 64-Bi t Server VM/1.7.0_25
 INFO [main] 2014-01-30 08:24:59,546 Default config values: {:rollups300_ttl 241 9200, :settings_cf "settings", :agent_rpc_interface "10.183.132.141", :my_channe l_prefix "/agent", :poll_period 60, :kerberos_hostname nil, :storage_dc nil, :th rift_conn_timeout 10000, :thrift_max_frame_size 15728640, :rollups60_ttl 604800,  :stomp_port 61620, :shorttime_interval 10, :longtime_interval 300, :private-con f-props ["initial_token" "listen_address" "broadcast_address" "rpc_address"], :t hrift_port 9160, :async_retry_timeout 5, :agent-conf-group "global-cluster-agent -group", :jmx_host "127.0.0.1", :ec2_metadata_api_host "169.254.169.254", :metri cs_enabled 1, :async_queue_size 5000, :autodiscovery_interval 120, :rollups7200_ ttl 31536000, :autodiscovery_enabled true, :thrift_ssl_truststore nil, :rollup_s napshot_period 300, :is_package true, :monitor_command "/usr/share/datastax-agen t/bin/datastax_agent_monitor", :thrift_socket_timeout 5000, :cassandra_log_locat ion "/var/log/cassandra/system.log", :local_interface "23.253.64.169", :jmx_port  7199, :jmx_metrics_threadpool_size 4, :use_ssl 0, :rollups86400_ttl -1, :nodede tails_threadpool_size 3, :api_port 61621, :kerberos_service nil, :kerberos_clien t_principal nil, :jmx_thread_pool_size 5, :production 1, :stomp_interface "166.7 8.186.184", :storage_keyspace "OpsCenter", :rollup_snapshot_threshold 300, :thri ft_ssl_truststore_type "JKS", :realtime_interval 5}
 INFO [main] 2014-01-30 08:24:59,554 Waiting for the config from OpsCenter
 INFO [main] 2014-01-30 08:24:59,559 Using 23.253.64.169 as the cassandra broadc ast address
 INFO [main] 2014-01-30 08:24:59,568 New JMX connection (127.0.0.1:7199)
ERROR [main] 2014-01-30 08:25:00,019 Error connecting via JMX: java.io.IOExcepti on: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException  [Root exception is java.rmi.ConnectException: Connection refused to host: 127.0. 0.1; nested exception is:
        java.net.ConnectException: Connection refused]
 INFO [main] 2014-01-30 08:25:00,414 cassandra RPC address is  nil
 INFO [main] 2014-01-30 08:25:00,418 agent RPC broadcast address is  10.183.132. 141
 INFO [main] 2014-01-30 08:25:00,474 Clearing ssl.truststore
 INFO [main] 2014-01-30 08:25:00,475 Clearing ssl.truststore.password
 INFO [main] 2014-01-30 08:25:00,476 Setting ssl.store.type to JKS
 INFO [main] 2014-01-30 08:25:00,477 Clearing kerberos.service.principal.name
 INFO [main] 2014-01-30 08:25:00,480 Clearing kerberos.principal
 INFO [main] 2014-01-30 08:25:00,480 Clearing kerberos.useTicketCache
 INFO [main] 2014-01-30 08:25:00,481 Clearing kerberos.ticketCache
 INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.useKeyTab
 INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.keyTab
 INFO [main] 2014-01-30 08:25:00,487 Clearing kerberos.renewTGT
 INFO [main] 2014-01-30 08:25:00,488 Clearing kerberos.debug
 INFO [main] 2014-01-30 08:25:00,495 Starting Stomp
 INFO [main] 2014-01-30 08:25:00,495 SSL communication is disabled
 INFO [main] 2014-01-30 08:25:00,495 Creating stomp connection to 166.78.186.184 :61620
 INFO [thrift-init] 2014-01-30 08:25:00,521 Connecting to Cassandra cluster: 23. 253.64.169 (port 9160)
 INFO [StompConnection receiver] 2014-01-30 08:25:00,536 Reconnecting in 0s.
 INFO [StompConnection receiver] 2014-01-30 08:25:00,561 Connected to 166.78.186 .184:61620
 INFO [thrift-init] 2014-01-30 08:25:00,619 Downed Host Retry service started wi th queue size -1 and retry delay 10s
 INFO [thrift-init] 2014-01-30 08:25:00,662 Registering JMX me.prettyprint.cassa ndra.service_Agent Cluster:ServiceType=hector,MonitorType=hector
 INFO [main] 2014-01-30 08:25:00,732 Starting Jetty server: {:port 61621, :host  "10.183.132.141", :ssl? false, :join? false}
ERROR [thrift-init] 2014-01-30 08:25:00,885 MARK HOST AS DOWN TRIGGERED for host  23.253.64.169(23.253.64.169):9160
ERROR [thrift-init] 2014-01-30 08:25:00,886 Pool state on shutdown: <ConcurrentC assandraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}; IsActive?: true;  Active: 0; Blocked: 1; Idle: 0; NumBeforeExhausted: 1
 INFO [thrift-init] 2014-01-30 08:25:00,887 Shutdown triggered on <ConcurrentCas sandraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}
 INFO [thrift-init] 2014-01-30 08:25:00,901 Shutdown complete on <ConcurrentCass andraClientPoolByHost>:{23.253.64.169(23.253.64.169):9160}
 INFO [thrift-init] 2014-01-30 08:25:00,902 Host detected as down was added to r etry queue: 23.253.64.169(23.253.64.169):9160
 WARN [thrift-init] 2014-01-30 08:25:00,914 Could not fullfill request on this h ost null
 WARN [Hector.me.prettyprint.cassandra.connection.CassandraHostRetryService-1] 2 014-01-30 08:25:00,910 Downed 23.253.64.169(23.253.64.169):9160 host still appea rs to be down: Unable to open transport to 23.253.64.169(23.253.64.169):9160 , j ava.net.ConnectException: Connection refused
 WARN [thrift-init] 2014-01-30 08:25:00,926 Exception:
me.prettyprint.hector.api.exceptions.HectorTransportException: Unable to open tr ansport to 23.253.64.169(23.253.64.169):9160 , java.net.ConnectException: Connec tion refused
        at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:180)
        at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:38)
        at me.prettyprint.cassandra.connection.ConcurrentHClientPool.createClien t(ConcurrentHClientPool.java:162)
        at me.prettyprint.cassandra.connection.ConcurrentHClientPool.borrowClien t(ConcurrentHClientPool.java:94)
        at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFai lover(HConnectionManager.java:250)
        at me.prettyprint.cassandra.service.AbstractCluster.describeClusterName( AbstractCluster.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
        at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
        at clj_hector.core$cluster_name.invoke(core.clj:40)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900$fn__920.invoke (cassandra.clj:360)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900.invoke(cassand ra.clj:358)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectExce ption: Connection refused
        at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
        at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.ja va:81)
        at me.prettyprint.cassandra.connection.client.HThriftClient.open(HThrift Client.java:174)
        ... 16 more
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
        ... 18 more
ERROR [thrift-init] 2014-01-30 08:25:00,965 Error when performing thrift operati on:
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down . Retry burden pushed out to client.
        at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromL BPolicy(HConnectionManager.java:395)
        at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFai lover(HConnectionManager.java:249)
        at me.prettyprint.cassandra.service.AbstractCluster.describeClusterName( AbstractCluster.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
        at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
        at clj_hector.core$cluster_name.invoke(core.clj:40)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900$fn__920.invoke (cassandra.clj:360)
        at opsagent.cassandra$setup_cassandra$f__353__auto____900.invoke(cassand ra.clj:358)
        at clojure.lang.AFn.run(AFn.java:24)
        at java.lang.Thread.run(Unknown Source)
 INFO [StompConnection receiver] 2014-01-30 08:25:01,024 Got new config from Ops Center: {:kerberos_use_keytab true, :rollups300_ttl 2419200, :kerberos_use_ticke t_cache true, :rollups60_ttl 604800, :thrift_port 9160, :ec2_metadata_api_host " 169.254.169.254", :metrics_enabled 1, :rollups7200_ttl 31536000, :thrift_ssl_tru ststore nil, :metrics_ignored_column_families "", :cassandra_log_location "/var/ log/cassandra/system.log", :thrift_rpc_interface "10.183.132.141", :thrift_ssl_t ruststore_password nil, :jmx_port 7199, :provisioning 0, :use_ssl 0, :kerberos_d ebug false, :rollups86400_ttl -1, :api_port "61621", :storage_keyspace "OpsCente r", :kerberos_renew_tgt true, :metrics_ignored_solr_cores "", :thrift_ssl_trusts tore_type "JKS", :metrics_ignored_keyspaces "system, system_traces, system_auth,  dse_auth, OpsCenter", :rollup_subscriptions [], :cassandra_install_location ""}
 INFO [StompConnection receiver] 2014-01-30 08:25:01,030 Starting up agent colle ction.
 INFO [StompConnection receiver] 2014-01-30 08:25:01,040 New JMX connection (127 .0.0.1:7199)
ERROR [StompConnection receiver] 2014-01-30 08:25:01,073 Error connecting via JM X: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceU navailableException [Root exception is java.rmi.ConnectException: Connection ref used to host: 127.0.0.1; nested exception is:
        java.net.ConnectException: Connection refused]
 INFO [Jetty] 2014-01-30 08:25:01,160 Jetty server started
 INFO [StompConnection receiver] 2014-01-30 08:25:01,188 Starting OS metric coll ectors (Linux)
 INFO [StompConnection receiver] 2014-01-30 08:25:01,199 Starting Cassandra JMX  metric collectors
 INFO [install-location-finder] 2014-01-30 08:25:01,250 New JMX connection (127. 0.0.1:7199)
 INFO [StompConnection receiver] 2014-01-30 08:25:01,252 New JMX connection (127 .0.0.1:7199)
ERROR [install-location-finder] 2014-01-30 08:25:01,261 Error connecting via JMX : java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUn availableException [Root exception is java.rmi.ConnectException: Connection refu sed to host: 127.0.0.1; nested exception is:
        java.net.ConnectException: Connection refused]
4

2 回答 2

1

此论坛帖子似乎正在捕获此设置可能发生的一些问题:

http://www.datastax.com/support-forums/topic/opscenter-agent-not-connecting-to-opscenter

于 2014-01-30T07:14:20.273 回答
1

内存不足错误/var/log/messages

Jan 30 20:06:39 hostname kernel: Out of memory: Kill process 2900 (java) score 788 or sacrifice child
Jan 30 20:06:39 hostname kernel: Killed process 2900, UID 0, (java) total-vm:1383360kB, anon-rss:717176kB, file-rss:113316kB

我使用的设置与您相同。该cassandra-env.sh脚本正在使用

system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`

但是,在此系统上(Linux 主机名 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 18:37:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux),free -m第 4 列有可用内存。将上述更改为

system_memory_in_mb=`free -m | awk '/Mem:/ {print $4}'`

和保存允许 Cassandra 启动而不会耗尽内存。

于 2014-01-30T20:21:24.097 回答