0

在安装过程中,我在主机 ip 192.168.240.14 的 Ubuntu 16.04 单节点上遇到此错误:

TASK [network : Ensuring that the calico.yaml file exist] **********************
changed: [localhost]

TASK [network : include] *******************************************************

TASK [network : include] *******************************************************

TASK [network : include] *******************************************************
included: /installer/playbook/roles/network/tasks/calico.yaml for localhost

TASK [network : Enabling calico] ***********************************************
changed: [localhost]

TASK [network : Waiting for configuring calico service] ************************
ok: [localhost -> 192.168.240.14] => (item=192.168.240.14)

TASK [network : Waiting for configuring calico node to node mesh] **************
FAILED - RETRYING: Waiting for configuring calico node to node mesh (100 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (99 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (98 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (97 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (96 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (95 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (94 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (93 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (92 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (91 retries left).

我读到可以禁用 calico 的节点到节点网格功能,但由于 calico 是通过 ICP 安装的,calicoctl因此无法识别该命令。在 config.yaml 中,我也找不到可以禁用此设置的选项。

到目前为止,我已尝试通过单独下载和执行 calicoctl 来禁用它,但无法建立与集群的连接:

user@user:~/Desktop/calicoctl$ ./calicoctl config set nodeToNodeMesh off
Error executing command: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refused

我不确定是不是因为它尝试拨打环回 IP 地址而不是 192.168.240.14 或其他。而且我也不知道这是否真的可以解决安装过程中的问题。

我对此不是很有经验,并感谢您的帮助!

编辑:

我再次使用 ICP 2.1.0.1 运行安装并遇到相同的错误,但重试了 10 次并收到以下错误消息:

TASK [network : Enabling calico] ***********************************************
changed: [localhost]

TASK [network : Waiting for configuring calico service] ************************
ok: [localhost -> 192.168.240.14] => (item=192.168.240.14)
FAILED - RETRYING: Waiting for configuring calico node to node mesh (10 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (9 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (8 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (7 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (6 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (5 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (4 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (3 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (2 retries left).
FAILED - RETRYING: Waiting for configuring calico node to node mesh (1 retries left).

TASK [network : Waiting for configuring calico node to node mesh] **************
fatal: [localhost]: FAILED! => {"attempts": 10, "changed": true, "cmd": "kubectl get pods --show-all --namespace=kube-system |grep configure-calico-mesh", "delta": "0:00:01.343071", "end": "2018-06-20 08:12:28.433186", "failed": true, "rc": 0, "start": "2018-06-20 08:12:27.090115", "stderr": "", "stderr_lines": [], "stdout": "configure-calico-mesh-9f756                 0/1       Pending   0          5m", "stdout_lines": ["configure-calico-mesh-9f756                 0/1       Pending   0          5m"]}

PLAY RECAP *********************************************************************
192.168.240.14             : ok=168  changed=54   unreachable=0    failed=0   
localhost                  : ok=81   changed=16   unreachable=0    failed=1   

Playbook run took 0 days, 0 hours, 19 minutes, 8 seconds

user@user:/opt/ibm-cloud-private-ce-2.1.0.1/cluster$ 

我不明白为什么突然将 localhost 包含在设置步骤中,因为我只在 hosts 文件中指定了我的 IP 地址:

[master]
192.168.240.14 ansible_user="user" ansible_ssh_pass="6CEd29CN" ansible_become=true ansible_become_pass="6CEd29CN" ansible_port="22" ansible_ssh_common_args="-oPubkeyAuthentication=no" 

[worker]
192.168.240.14 ansible_user="user" ansible_ssh_pass="6CEd29CN" ansible_become=true ansible_become_pass="6CEd29CN" ansible_port="22" ansible_ssh_common_args="-oPubkeyAuthentication=no" 

[proxy]
192.168.240.14 ansible_user="user" ansible_ssh_pass="6CEd29CN" ansible_become=true ansible_become_pass="6CEd29CN" ansible_port="22" ansible_ssh_common_args="-oPubkeyAuthentication=no" 

#[management]
#4.4.4.4

#[va]
#5.5.5.5

我的 config.yaml 文件如下所示:

# Licensed Materials - Property of IBM
# IBM Cloud private
# @ Copyright IBM Corp. 2017 All Rights Reserved
# US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

---

###### docker0: 172.17.0.1
###### eth0: 192.168.240.14

## Network Settings
#network_type: calico
# network_helm_chart_path: < helm chart path >

## Network in IPv4 CIDR format
network_cidr: 10.1.0.0/16

## Kubernetes Settings
service_cluster_ip_range: 10.0.0.1/24

## Makes the Kubelet start if swap is enabled on the node. Remove
## this if your production env want to disble swap.
kubelet_extra_args: ["--fail-swap-on=false"]

# cluster_domain: cluster.local
# cluster_name: mycluster
cluster_CA_domain: "mydomain.icp"
# cluster_zone: "myzone"
# cluster_region: "myregion"

## Etcd Settings
#etcd_extra_args: ["--grpc-keepalive-timeout=0", "--grpc-keepalive-interval=0", #"--snapshot-count=10000"]

## General Settings
# wait_for_timeout: 600
# docker_api_timeout: 100

## Advanced Settings
default_admin_user: user
default_admin_password: 6CEd29CN
# ansible_user: <username>
# ansible_become: true
# ansible_become_password: <password>

## Kubernetes Settings
# kube_apiserver_extra_args: []
# kube_controller_manager_extra_args: []
# kube_proxy_extra_args: []
# kube_scheduler_extra_args: []

## Enable Kubernetes Audit Log
# auditlog_enabled: false

## GlusterFS Settings
# glusterfs: false

## GlusterFS Storage Settings
# storage:
#  - kind: glusterfs
#    nodes:
#      - ip: <worker_node_m_IP_address>
#        device: <link path>/<symlink of device aaa>,<link path>/<symlink of device bbb>
#      - ip: <worker_node_n_IP_address>
#        device: <link path>/<symlink of device ccc>
#      - ip: <worker_node_o_IP_address>
#        device: <link path>/<symlink of device ddd>
#    storage_class:
#      name:
#      default: false
#      volumetype: replicate:3

## Network Settings
## Calico Network Settings
### calico_ipip_enabled: true
calico_ipip_enabled: false
calico_tunnel_mtu: 1430
calico_ip_autodetection_method: interface=eth0



## IPSec mesh Settings
## If user wants to configure IPSec mesh, the following parameters
## should be configured through config.yaml
ipsec_mesh:
   enable: false
#   interface: <interface name on which IPsec will be enabled>
#   subnets: []
#   exclude_ips: "<list of IP addresses separated by a comma>"

kube_apiserver_insecure_port: 8080
kube_apiserver_secure_port: 8001

## External loadbalancer IP or domain
## Or floating IP in OpenStack environment
# cluster_lb_address: none

## External loadbalancer IP or domain
## Or floating IP in OpenStack environment
# proxy_lb_address: none

## Install in firewall enabled mode
firewall_enabled: false

## Allow loopback dns server in cluster nodes
loopback_dns: true

## High Availability Settings
# vip_manager: etcd

## High Availability Settings for master nodes
# vip_iface: eth0
# cluster_vip: 127.0.1.1

## High Availability Settings for Proxy nodes
# proxy_vip_iface: eth0
# proxy_vip: 127.0.1.1

## Federation cluster Settings
# federation_enabled: false
# federation_cluster: federation-cluster
# federation_domain: cluster.federation
# federation_apiserver_extra_args: []
# federation_controllermanager_extra_args: []
# federation_external_policy_engine_enabled: false

## vSphere cloud provider Settings
## If user wants to configure vSphere as cloud provider, vsphere_conf
## parameters should be configured through config.yaml
# kubelet_nodename: hostname
# cloud_provider: vsphere
# vsphere_conf:
#    user: <vCenter username for vSphere cloud provider>
#    password: <password for vCenter user>
#    server: <vCenter server IP or FQDN>
#    port: [vCenter Server Port; default: 443]
#    insecure_flag: [set to 1 if vCenter uses a self-signed certificate]
#    datacenter: <datacenter name on which Node VMs are deployed>
#    datastore: <default datastore to be used for provisioning volumes>
#    working_dir: <vCenter VM folder path in which node VMs are located>

## Disabled Management Services Settings
## You can disable the following management services: ["service-catalog", "metering", "monitoring", "istio", "vulnerability-advisor", "custom-metrics-adapter"]
#disabled_management_services: ["istio", "vulnerability-advisor", "custom-metrics-adapter"]
disabled_management_services: ["service-catalog", "metering", "monitoring", "istio", "vulnerability-advisor", "custom-metrics-adapter"]


## Docker Settings
# docker_env: []
# docker_extra_args: []
## The maximum size of the log before it is rolled
# docker_log_max_size: 50m
## The maximum number of log files that can be present
# docker_log_max_file: 10
## Install/upgrade docker version
# docker_version: 17.12.1
## ICP install docker automatically
# install_docker: true

## Ingress Controller Settings
## You can add your ingress controller configuration, and the allowed configuration can refer to
## https://github.com/kubernetes/ingress-nginx/blob/nginx-0.9.0/docs/user-guide/configmap.md#configuration-options
# ingress_controller:
#   disable-access-log: 'true'

## Clean metrics indices in Elasticsearch older than this number of days
# metrics_max_age: 1

## Clean application log indices in Elasticsearch older than this number of days
# logs_maxage: 1

## Uncomment the line below to install Kibana as a managed service.
kibana_install: true


# STARTING_CLOUDANT

# cloudant:
#   namespace: kube-system
#   pullPolicy: IfNotPresent
#   pvPath: /opt/ibm/cfc/cloudant
#   database:
#     password: fdrreedfddfreeedffde
#     federatorCommand: hostname
#     federationIdentifier: "-0"
#     readinessProbePeriodSeconds: 2
#     readinessProbeInitialDelaySeconds: 90

# END_CLOUDANT
4

1 回答 1

0

在 Ubuntu 服务器上使用 Ansible 部署时遇到了类似的问题......正如用户在Kubernetes 问题 43156上提到的,“我们不应该从节点继承 pod resolv.conf 中的 nameserver 127.xxx,因为节点 localhost 不会可从吊舱访问。”

如果您的 /etc/resolv.conf 上面有 localhost IP,我建议您将其替换为节点 IP,例如,如果您使用 Ubuntu,请选择退出 NetworkManager 以避免它在之后将其设置回来重新启动:

systemctl disable --now systemd-resolved.service cp /etc/resolv.conf /etc/resolv.conf.bkp echo "nameserver <Node's_IP>" > /etc/resolv.conf

有关选择退出 NetworkManager 的更多详细信息,请访问以下链接:

于 2019-01-05T03:24:19.010 回答