1

我为 openstack 设置了 2 个节点。

第一个节点包含管理服务,如nova-api, nova-scheduler, 'glance` ... 第二个节点包含网络和计算服务。

当我检查nova-manage service list所有服务都出现时。

当我重新启动管理节点(节点 1)时,计算已断开连接。

当计算尝试连接管理节点时,它在计算日志中显示错误。

2013-01-21 20:49:28 TRACE nova.manager Traceback (most recent call last):
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/manager.py", line 155, in periodic_tasks
2013-01-21 20:49:28 TRACE nova.manager     task(self, context)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2244, in _heal_instance_info_cache
2013-01-21 20:49:28 TRACE nova.manager     context, self.host)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/db/api.py", line 594, in instance_get_all_by_host
2013-01-21 20:49:28 TRACE nova.manager     return IMPL.instance_get_all_by_host(context, host)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 103, in wrapper
2013-01-21 20:49:28 TRACE nova.manager     return f(*args, **kwargs)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 1582, in instance_get_all_by_host
2013-01-21 20:49:28 TRACE nova.manager     return _instance_get_all_query(context).filter_by(host=host).all()
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 1922, in all
2013-01-21 20:49:28 TRACE nova.manager     return list(self)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2032, in __iter__
2013-01-21 20:49:28 TRACE nova.manager     return self._execute_and_instances(context)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2047, in _execute_and_instances
2013-01-21 20:49:28 TRACE nova.manager     result = conn.execute(querycontext.statement, self._params)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1399, in execute
2013-01-21 20:49:28 TRACE nova.manager     params)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1532, in _execute_clauseelement
2013-01-21 20:49:28 TRACE nova.manager     compiled_sql, distilled_params
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1640, in _execute_context
2013-01-21 20:49:28 TRACE nova.manager     context)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1633, in _execute_context
2013-01-21 20:49:28 TRACE nova.manager     context)
2013-01-21 20:49:28 TRACE nova.manager   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.3-py2.6-linux-x86_64.egg/sqlalchemy/engine/default.py", line 330, in do_execute
2013-01-21 20:49:28 TRACE nova.manager     cursor.execute(statement, parameters)
2013-01-21 20:49:28 TRACE nova.manager OperationalError: (OperationalError) socket not open

当我重新启动计算和网络服务时,它解决了问题。但在我重新启动计算或网络之前,它会给出错误。

当我在计算上检查为控制器打开的套接字时。

[root@compute ~]# ps -ef | grep compute
nova     30859     1 27 18:51 ?        00:00:03 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log
root     30996 30807  0 18:51 pts/0    00:00:00 grep compute

[root@compute ~]# netstat -p | grep 30859
tcp        0      0 compute:56988        controller:postgres     ESTABLISHED 30859/python
tcp        0      0 compute:37869        controller:amqps        ESTABLISHED 30859/python
tcp        0      0 compute:37871        controller:amqps        ESTABLISHED 30859/python
unix  3      [ ]         STREAM     CONNECTED     3588759 30859/python

控制器有 2 个插座。postgresamqps。当我reboot now在控制器上运行并检查控制器可用的套接字数量时。

[root@compute ~]# netstat -p | grep 30859
tcp      208      0 compute:56988        controller:postgres     CLOSE_WAIT  30859/python
unix  3      [ ]         STREAM     CONNECTED     3590103 30859/python
unix  3      [ ]         STREAM     CONNECTED     3588759 30859/python

在这个postgres套接字是关闭的。

当所有服务都出现在控制器中时。我运行相同的命令来检查连接到控制器的套接字。我得到了同样的结果。

为什么计算不为其创建新套接字postgres

4

1 回答 1

1

正如马特乔伊斯在上面指出的那样,您收到的套接字错误来自 nova-compute 尝试联系您在 nova.conf 中配置的数据库。在日志的前面部分,您可以看到配置服务的所有值。查找字符串“Full set of FLAGS”——这至少会暗示那里的配置——它从日志输出中隐藏了“sql_connection”的实际值(因为它通常嵌入了密码),但它可能有助于解释那里发生了什么。

From what I'm reading of your question, the nova-compute log files shows this error until your restart the service. Do I read correctly that it works after that?

Assuming that's correct, is there something that is configuring nova after the base packaged are installed? A run of chef, puppet or the like thats adding configuration details after the service might have started up with an incorrect configuration?

于 2013-01-26T19:25:11.437 回答