在过去的 2 年半时间里,我们一直在运行 OpenStack 环境,在此过程中遇到了一些小问题,但大多数情况下几乎没有停机时间。最近,我们一直在尝试将新硬件添加到堆栈中作为 nova-compute 节点,以便为我们的 VM 提供更多 CPU 内核和 RAM。不幸的是,由于某种原因,安装并不顺利。
我们使用 JuJu 和 MaaS 运行 Xenial/Queens 以进行部署/配置。我们一直在运行 Xenial/Pike,直到 12 月我们升级。我们开始怀疑升级到皇后区是造成问题的原因,因为我们能够在升级之前添加新硬件。我们甚至移除了我们现有的一台充当 nova-compute 节点的机器,并尝试将其添加回堆栈,它现在也表现出与我们的新硬件相同的问题。
问题的根本原因似乎与 neutron-openvswitch 应用程序有关。当我们通过 JuJu 安装 nova-compute charm 时,一切似乎都很顺利,直到下级 neutron-openvswitch charm 的(自动)安装/配置。在我们的 OpenStack 管理网络(eno1 上的 10.10.30.0/24)上安装连接期间的某个时间点查看日志时丢失。我们可以通过在 eno2(不同的外部网络)上添加第二个连接来强制安装进一步进行,但是 eno1 上的连接丢失仍然存在,并且计算服务无法与其余的通信堆。
查看堆栈中其他功能正常的计算节点,看起来管理网桥 (br-eno1) 不是由 neutron-openvswitch 魅力创建的。该过程的某些部分看起来正在关闭 eno1 以准备创建网桥,但随后失败,导致机器无法在该接口上与堆栈的其余部分进行通信。
自从升级到 Queens 后,我们的配置都没有改变,但也许我们不知道 Pike -> Queens 升级附带的默认配置有一些弃用或更改?我们已经阅读了发行说明,但似乎找不到任何可以解释这种行为的东西。
任何帮助将不胜感激。我在下面包含了一些我认为相关的日志文件段,但可以提供可能需要的任何其他内容。提前致谢!
损坏的服务器 ifconfig
eno1 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet addr:10.10.30.101 Bcast:10.10.30.255 Mask:255.255.255.0
inet6 addr: fe80::4ed9:8fff:fec5:2e3/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:487314 errors:0 dropped:0 overruns:0 frame:0
TX packets:91955 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:255807482 (255.8 MB) TX bytes:6693026 (6.6 MB)
Interrupt:17
eno2 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet addr:10.189.134.103 Bcast:10.189.134.255 Mask:255.255.255.0
inet6 addr: fe80::4ed9:8fff:fec5:2e4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:195386 errors:0 dropped:0 overruns:0 frame:0
TX packets:89021 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:29175518 (29.1 MB) TX bytes:37673375 (37.6 MB)
Interrupt:18
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:181496 errors:0 dropped:0 overruns:0 frame:0
TX packets:181496 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:22574807 (22.5 MB) TX bytes:22574807 (22.5 MB)
lxdbr0 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet6 addr: fe80::1/64 Scope:Link
inet6 addr: fe80::b8c2:36ff:fe60:de08/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:650 (650.0 B)
损坏的服务器 ovs-vsctl 显示
fc878983-8ae5-479f-999f-d809f5a2ba8f
Manager "ptcp:6640:127.0.0.1"
is_connected: true
Bridge br-data
Port "eno1"
Interface "eno1"
Port br-data
Interface br-data
type: internal
Bridge br-ex
Port br-ex
Interface br-ex
type: internal
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port br-int
Interface br-int
type: internal
ovs_version: "2.9.5"
工作服务器 ifconfig:
br-eno1 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet addr:10.10.30.117 Bcast:10.10.30.255 Mask:255.255.255.0
inet6 addr: fe80::1a66:daff:fe55:6bdc/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:9552045918 errors:0 dropped:4 overruns:0 frame:0
TX packets:8731602524 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:25169343655058 (25.1 TB) TX bytes:20302362419370 (20.3 TB)
eno1 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet6 addr: fe80::1a66:daff:fe55:6bdc/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:27433132917 errors:0 dropped:821138 overruns:0 frame:0
TX packets:25763792601 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:31217303277897 (31.2 TB) TX bytes:26547305328673 (26.5 TB)
Interrupt:18
eno2 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet addr:10.189.134.118 Bcast:10.189.134.255 Mask:255.255.255.0
inet6 addr: fe80::1a66:daff:fe55:6bdd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:23432963 errors:0 dropped:0 overruns:0 frame:0
TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2858920977 (2.8 GB) TX bytes:2404 (2.4 KB)
Interrupt:19
eno3 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Interrupt:19
eno4 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Interrupt:16
gre_sys Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet6 addr: fe80::d061:36ff:fecd:3bdf/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65000 Metric:1
RX packets:1247735590 errors:0 dropped:0 overruns:0 frame:0
TX packets:1053172217 errors:0 dropped:8 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:934609315304 (934.6 GB) TX bytes:1138575443474 (1.1 TB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:874404497 errors:0 dropped:0 overruns:0 frame:0
TX packets:874404497 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:1422560696594 (1.4 TB) TX bytes:1422560696594 (1.4 TB)
lxdbr0 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet addr:10.0.216.1 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::d83b:4eff:fedb:7be0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:750 (750.0 B)
qbr267cccc8-45 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
UP BROADCAST RUNNING MULTICAST MTU:1458 Metric:1
RX packets:257167 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8981790 (8.9 MB) TX bytes:0 (0.0 B)
.
.
.
.
tap267cccc8-45 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet6 addr: fe80::fc16:3eff:fede:d180/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1458 Metric:1
RX packets:4801309 errors:0 dropped:0 overruns:0 frame:0
TX packets:6300403 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12100707022 (12.1 GB) TX bytes:3222243030 (3.2 GB)
.
.
.
.
vethWY9OQC Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF (redacted)
inet6 addr: fe80::fc50:b6ff:fe7a:2584/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:533168318 errors:0 dropped:0 overruns:0 frame:0
TX packets:468982413 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:191221371188 (191.2 GB) TX bytes:227602758832 (227.6 GB)
工作服务器 ovs-vsctl 显示
be5c20fd-46ef-4991-8dc3-3860944308e5
Manager "ptcp:6640:127.0.0.1"
is_connected: true
Bridge br-data
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port "eno1"
Interface "eno1"
error: "could not add network device eno1 to ofproto (Device or resource busy)"
Port "eno2"
Interface "eno2"
Port br-data
Interface br-data
type: internal
Port phy-br-data
Interface phy-br-data
type: patch
options: {peer=int-br-data}
Bridge br-tun
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port "gre-0a0a1e7f"
Interface "gre-0a0a1e7f"
type: gre
options: {df_default="true", in_key=flow, local_ip="10.10.30.117", out_key=flow, remote_ip="10.10.30.127"}
Port "gre-0a0a1e74"
Interface "gre-0a0a1e74"
type: gre
options: {df_default="true", in_key=flow, local_ip="10.10.30.117", out_key=flow, remote_ip="10.10.30.116"}
Port "gre-0a0a1e76"
Interface "gre-0a0a1e76"
type: gre
options: {df_default="true", in_key=flow, local_ip="10.10.30.117", out_key=flow, remote_ip="10.10.30.118"}
Port br-tun
Interface br-tun
type: internal
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port "qvo5560dd35-7e"
tag: 2
Interface "qvo5560dd35-7e"
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "qvo97c660e7-e3"
tag: 1
Interface "qvo97c660e7-e3"
Port "qvo44aeabe3-de"
tag: 1
Interface "qvo44aeabe3-de"
Port "qvo267cccc8-45"
tag: 1
Interface "qvo267cccc8-45"
Port "qvofdf0ce36-50"
tag: 2
Interface "qvofdf0ce36-50"
Port "qvof193baf6-c0"
tag: 1
Interface "qvof193baf6-c0"
Port "qvod9facd45-41"
tag: 1
Interface "qvod9facd45-41"
Port "qvoeeab657c-df"
tag: 1
Interface "qvoeeab657c-df"
Port "qvodd4b9252-e5"
tag: 1
Interface "qvodd4b9252-e5"
Port br-int
Interface br-int
type: internal
Port "qvoc841a7f1-25"
tag: 2
Interface "qvoc841a7f1-25"
Port "qvod6b38e4c-a1"
tag: 2
Interface "qvod6b38e4c-a1"
Port int-br-data
Interface int-br-data
type: patch
options: {peer=phy-br-data}
Bridge br-ex
Port br-ex
Interface br-ex
type: internal
ovs_version: "2.9.2"
损坏的服务器 /var/log/juju/unit-neutron-openvswitch.log 这些是机器在管理网络 (eno1) 上失去连接之前的最后几行。
2020-05-26 18:08:02 DEBUG config-changed net.netfilter.nf_conntrack_max = 1000000
2020-05-26 18:08:02 DEBUG config-changed net.ipv4.neigh.default.gc_thresh2 = 28672
2020-05-26 18:08:02 DEBUG config-changed net.ipv6.neigh.default.gc_thresh1 = 128
2020-05-26 18:08:02 DEBUG config-changed net.nf_conntrack_max = 1000000
2020-05-26 18:08:02 DEBUG config-changed sysctl: setting key "net.netfilter.nf_conntrack_buckets"
2020-05-26 18:08:02 DEBUG config-changed net.ipv4.neigh.default.gc_thresh3 = 32768
2020-05-26 18:08:02 DEBUG config-changed net.ipv4.neigh.default.gc_thresh1 = 128
2020-05-26 18:08:02 DEBUG config-changed net.ipv6.neigh.default.gc_thresh2 = 28672
2020-05-26 18:08:02 DEBUG config-changed net.ipv6.neigh.default.gc_thresh3 = 32768
2020-05-26 18:08:02 DEBUG config-changed active
2020-05-26 18:08:03 INFO juju-log Creating bridge br-int
2020-05-26 18:08:03 INFO juju-log Creating bridge br-ex
2020-05-26 18:08:03 WARNING juju-log Support for use of upstream ``apt_pkg`` module in conjunctionwith charm-helpers is deprecated since 2019-06-25
2020-05-26 18:08:03 INFO juju-log Creating bridge br-data
2020-05-26 18:08:03 DEBUG juju-log Interface eno1 is not a Linux bridge
2020-05-26 18:08:03 INFO juju-log Adding port eno1 to bridge br-data
2020-05-26 18:08:03 DEBUG config-changed Failed to restart os-charm-phy-nic-mtu.service: Unit os-charm-phy-nic-mtu.service not found.
然后,我们看到以下内容(只能在现场访问或通过 eno2 连接访问):
2020-05-26 18:08:53 ERROR juju.api monitor.go:59 health ping timed out after 30s
2020-05-26 18:08:53 ERROR juju.worker.dependency engine.go:551 "api-caller" manifold worker returned unexpected error: api connection broken unexpectedly
2020-05-26 18:08:53 INFO juju-log Loaded template from templates/queens/openvswitch_agent.ini
2020-05-26 18:08:53 INFO juju-log Rendering from template: /etc/neutron/plugins/ml2/openvswitch_agent.ini
2020-05-26 18:08:53 INFO juju-log Wrote template /etc/neutron/plugins/ml2/openvswitch_agent.ini.
2020-05-26 18:08:54 DEBUG juju-log Generating template context for amqp
2020-05-26 18:08:54 DEBUG config-changed Traceback (most recent call last):
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/config-changed", line 266, in <module>
2020-05-26 18:08:54 DEBUG config-changed main()
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/config-changed", line 259, in main
2020-05-26 18:08:54 DEBUG config-changed hooks.execute(sys.argv)
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/core/hookenv.py", line 914, in execute
2020-05-26 18:08:54 DEBUG config-changed self._hooks[hook_name]()
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1568, in wrapped_f
2020-05-26 18:08:54 DEBUG config-changed stopstart, restart_functions)
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/core/host.py", line 741, in restart_on_change_helper
2020-05-26 18:08:54 DEBUG config-changed r = lambda_f()
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1567, in <lambda>
2020-05-26 18:08:54 DEBUG config-changed (lambda: f(*args, **kwargs)), __restart_map_cache['cache'],
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/config-changed", line 150, in config_changed
2020-05-26 18:08:54 DEBUG config-changed CONFIGS.write_all()
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/contrib/openstack/templating.py", line 334, in write_all
2020-05-26 18:08:54 DEBUG config-changed [self.write(k) for k in six.iterkeys(self.templates)]
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/contrib/openstack/templating.py", line 334, in <listcomp>
2020-05-26 18:08:54 DEBUG config-changed [self.write(k) for k in six.iterkeys(self.templates)]
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/contrib/openstack/templating.py", line 321, in write
2020-05-26 18:08:54 DEBUG config-changed _out = self.render(config_file)
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/contrib/openstack/templating.py", line 281, in render
2020-05-26 18:08:54 DEBUG config-changed ctxt = ostmpl.context()
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/contrib/openstack/templating.py", line 112, in context
2020-05-26 18:08:54 DEBUG config-changed _ctxt = context()
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/neutron_ovs_context.py", line 633, in __call__
2020-05-26 18:08:54 DEBUG config-changed host_ip = get_relation_ip('neutron-plugin')
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/contrib/network/ip.py", line 583, in get_relation_ip
2020-05-26 18:08:54 DEBUG config-changed address = network_get_primary_address(interface)
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/core/hookenv.py", line 1043, in inner_translate_exc2
2020-05-26 18:08:54 DEBUG config-changed return f(*args, **kwargs)
2020-05-26 18:08:54 DEBUG config-changed File "/var/lib/juju/agents/unit-neutron-openvswitch-43/charm/hooks/charmhelpers/core/hookenv.py", line 1239, in network_get_primary_address
2020-05-26 18:08:54 DEBUG config-changed stderr=subprocess.STDOUT).decode('UTF-8').strip()
2020-05-26 18:08:54 DEBUG config-changed File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
2020-05-26 18:08:54 DEBUG config-changed **kwargs).stdout
2020-05-26 18:08:54 DEBUG config-changed File "/usr/lib/python3.5/subprocess.py", line 708, in run
2020-05-26 18:08:54 DEBUG config-changed output=stdout, stderr=stderr)
2020-05-26 18:08:54 DEBUG config-changed subprocess.CalledProcessError: Command '['network-get', '--primary-address', 'neutron-plugin']' returned non-zero exit status 1
2020-05-26 18:08:54 ERROR juju.worker.uniter.operation runhook.go:113 hook "config-changed" failed: exit status 1
2020-05-26 18:09:13 INFO juju-log Registered config file: /etc/neutron/neutron.conf
2020-05-26 18:09:13 INFO juju-log Registered config file: /etc/neutron/plugins/ml2/openvswitch_agent.ini