1

OpsCenter 版本:5.1.0 和 DSE 版本:4.6.0

直接使用 OpsCenter 创建一个全新的集群,会出现以下错误。它在相同的设置下随机工作,但 95% 的时间它失败并出现相同的错误。Opscenter 在自己的机器上运行,但与集群实例共享相同的安全组。为了更好地衡量,我已经向所有 IP 开放了所有 TCP 端口。以下是来自 opscenterd.log 的错误堆栈跟踪:

*2015-03-19 10:06:12+0000 [] INFO:开始配置过程 2015-03-19 10:06:12+0000 [] INFO:开始集群配置的安装阶段

2015-03-19 10:06:13+0000 [] 警告:HTTP 请求http://10.xxx:61621/alive?failed:连接被对方​​拒绝:111:连接被拒绝。

2015-03-19 10:06:13+0000 [] 信息:开始将 OpsCenter 代理安装到 54.xxx

2015-03-19 10:06:26+0000 [] 警告:HTTP 请求http://10.xxx:61621/alive?failed:连接被对方​​拒绝:111:连接被拒绝。

2015-03-19 10:06:31+0000 [] INFO:ip 10.xxx 的代理是版本无 2015-03-19 10:06:31+0000 [] INFO:ip 10.xxx 的代理是版本 u '5.1.0' 2015-03-19 10:07:23+0000 [] INFO: 在节点 10.xxx 上成功安装代理和 dse

2015-03-19 10:07:23+0000 [] 信息:开始集群配置的“停止”阶段

2015-03-19 10:07:25+0000 [] WARN: 将请求 '10.xxx: /ops/stop' (f6708fa2-b45f-42b4-b992-90a82b460ac7) 标记为失败:/usr/sbin/service dse stop失败的

    exit status: 1
    stdout:
    log_daemon_msg is a shell function
    Cassandra 2.0 and later require Java 7 or later.

2015-03-19 10:07:25+0000 [] 错误:无法停止节点 10.xxx:/usr/sbin/service dse 停止失败

    exit status: 1
    stdout:
    log_daemon_msg is a shell function
    Cassandra 2.0 and later require Java 7 or later.

2015-03-19 10:07:25+0000 [] WARN:将请求“停止阶段”(0b6fcb6b-96ba-404e-a484-b4b6b167b309)标记为失败:无法停止节点 10.xxx:/usr/sbin/service dse 停止失败

    exit status: 1
    stdout:
    log_daemon_msg is a shell function
    Cassandra 2.0 and later require Java 7 or later.

2015-03-19 10:07:25+0000 [] 错误:停止阶段失败:无法停止节点 10.xxx:/usr/sbin/service dse 停止失败

    exit status: 1
    stdout:
    log_daemon_msg is a shell function
    Cassandra 2.0 and later require Java 7 or later.

2015-03-19 10:07:25+0000 [] WARN:将请求“提供”(daf1c15d-92e3-40b0-83ca-34d548ea835b)标记为失败:停止阶段失败:无法停止节点 10.xxx:/usr/ sbin/服务 dse 停止失败

    exit status: 1
    stdout:
    log_daemon_msg is a shell function
    Cassandra 2.0 and later require Java 7 or later.

2015-03-19 10:07:25+0000 [] 错误:2015-03-19 10:07:25+0000 [] 错误:集群配置失败:异常:停止阶段失败:无法停止节点 10.xxx: /usr/sbin/service dse 停止失败

    exit status: 1
    stdout:
    log_daemon_msg is a shell function
    Cassandra 2.0 and later require Java 7 or later.

2015-03-19 10:07:25+0000 [] 错误:无法配置集群:集群配置失败:异常:停止阶段失败:无法停止节点 10.xxx:/usr/sbin/service dse 停止失败

    exit status: 1
    stdout:
    log_daemon_msg is a shell function
    Cassandra 2.0 and later require Java 7 or later.

2015-03-19 10:07:25+0000 [] WARN:将请求 28c021fd-d21a-4fed-bb5c-a4fe17d362e0 标记为失败:集群配置失败:异常:停止阶段失败:无法停止节点 10.xxx:/usr /sbin/service dse 停止失败

    exit status: 1
    stdout:
    log_daemon_msg is a shell function
    Cassandra 2.0 and later require Java 7 or later.

2015-03-19 10:07:41+0000 [] WARN: 无法为 IP [u'fe80:0:0:0:2000:aff:feeb:31c7%2', u' 的节点找到匹配的集群10.xxx', u'0:0:0:0:0:0:0:1%1', u'127.0.0.1']; 消息是 [u'5.1.0', u'/1947480708/conf']。这通常表示 OpsCenter 代理仍在运行的旧节点上运行,该节点已停用或属于 OpsCenter 不再监控的群集的一部分。

感谢任何帮助!提前感谢哈沙

4

2 回答 2

6

OpCenter developer here. I make the OpsCenter provisioning features go zoom (or splat occasionally as you've seen). It is with sadness and shame that I must tell you that you're hitting a bug.

The Datastax AMI version 2.4 used by OpsCenter provisioning (https://github.com/riptano/ComboAMI/tree/2.4) does quite a bit of work at boot time via startup scripts. One of those tasks is to set up some gpg repository keys used to validate packages. Intermittently that process can fail, breaking package installs and leading to the series of errors that you're seeing. This failure is intermittent and has greatly increased in frequency recently. If you check /home/ubuntu/datastax-ami/ami.log you should see the gpg key failures that begin the rest of the failure chain.

Unfortunately, this error is pretty far down the technology stack and is difficult to manually work around. If you just need to provision a single cluster you can retry until you get a good run. Otherwise your best best is to manually launch the instances and use local provisioning to deploy dse/dsc to their private ip addresses:

  • Launch instances using ami-ada2b6c4 (assuming you're in us-east-1)
    • Make sure to add the instances to the OpsCenterSecurity group.
    • Make sure you have the private half of the keypair you use (you'll need it during local provisioning)
    • On the instance data page, hit the advanced pulldown and add the following userdata as text "--raidonly --java7"
  • Do a local-provisioning run against the private-ip's

Not a super-simple workaround. I wish your experience with OpsCenter this time around was more awesome. The good news is I'm on this bug and it will be fixed in an upcoming point release.

Edit: No longer necessary to manually remove /etc/security/limits.d/cassandra.conf

于 2015-03-20T20:02:32.647 回答
0

如果它只是抱怨 java,那么最好安装 java 7,datastax 需要 oracle jdk 和 jre。您的节点上可能已经有 java 7 和另一个版本,但 java 7 不是默认版本。要改变这一点:

sudo update-java-alternatives -s java-7-oracle

这是您可以编写脚本以使用 ssh 运行的命令,因此您不必登录到每个节点

于 2015-03-19T10:25:25.963 回答