docker - Rancher、Kubernetes 和 StorageOS：持久存储、卷挂载问题？

Question

我需要帮助解决一个问题。我已经在 kubernetes 之上安装了 storagesos 集群。我能够创建卷。但是当我创建 PVC 并将它们安装到我的 K8 pod 时，pod 卡在容器创建状态。从我可以从日志中收集到的是：

从卷映射到主机 /var/lib/storageos/volumes/mypvcid 的卷目录/文件夹不存在
storageos 集群容器无法在主机 VM（AWS 上的 RHEL）上创建这些目录/文件夹。

请帮助我，我已经被困了好几天了。谢谢...这里是日志：https ://imgur.com/a/VcggCJn

StorageOS 日志 -

    > kubectl logs storageos-dpj6q
time="2018-07-11T18:38:30Z" level=info msg="starting server" address=172.31.14.99 cluster= hostname=ip-172-31-14-99 id=4dd922fe-d998-c67a-122e-1ea755ec61e9 join=d732967d-93c4-4018-8e0e-a9170065e8e2 labels="map[]" module=command version="StorageOS 1.0.0 (f8915fa), built: 2018-05-25T190132Z"
time="2018-07-11T18:38:30Z" level=info msg="starting api server" action=create category=server endpoint="0.0.0.0:5705" module=cp
time="2018-07-11T18:38:30Z" level=info msg="starting embedded etcd server" action=create category=etcd cluster_id= initialised=true members= module=cp
time="2018-07-11T18:38:30Z" level=info msg="by using this product, you are agreeing to the terms of the StorageOS Ltd. End User Subscription Agreement (EUSA) found at: https://eusa.storageos.com" module=command
time="2018-07-11T18:38:30Z" level=info msg="started temporary docker volume plugin api while control plane starts"
time="2018-07-11T18:38:38Z" level=info msg="embedded etcd server started successfuly" action=create category=etcd module=cp
time="2018-07-11T18:38:38Z" level=info msg="new messaging routes found, [re]starting nats" action=update category=nats module=cp
time="2018-07-11T18:38:39Z" level=info msg="connected to store" action=wait address="http://127.0.0.1:5706" backend=embedded category=etcd module=cp
time="2018-07-11T18:38:39Z" level=info msg="temporary docker volume plugin api shutdown"
time="2018-07-11T18:38:39Z" level=info msg="leader election process started" action=election category=leader module=ha
time="2018-07-11T18:38:39Z" level=info msg="docker plugin api started, exporting filesystems at /var/lib/storageos/filesystems"
time="2018-07-11T18:38:39Z" level=error msg="startup force unmount failed" device_dir=/var/lib/storageos/volumes error="exit status 32" module=supervisor output="umount: /var/lib/storageos/volumes: not mounted"
time="2018-07-11T18:38:39Z" level=info msg="reaper started"
time="2018-07-11T18:38:39Z" level=info msg="dataplane notifications server starting..." path="unix:///var/run/storageos/dataplane-notifications.sock"
time="2018-07-11T18:38:39Z" level=info msg="syncer started"
time="2018-07-11T18:38:39Z" level=info msg="module=\"storageos-stats\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="StorageOS Volume Presentation level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"storageos-fs-director\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="fuse: mountpoint is not empty" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="fuse: if you are sure this is safe, use the 'nonempty' mount option" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="Error creating FUSE channel! level=fatal" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"server\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"client\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="StorageOS DirectFS v1 server (server v0.1 protocol v1.3) start level=info" module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="StorageOS DIRECTOR category=director level=info" category=director module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="module=\"storageos-director\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" category=director module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="CACHE: configure ram of 256MB exceeds maximum recommended of 66 MiB category=corecache level=warn" category=corecache module=supervisor
time="2018-07-11T18:38:40Z" level=info msg="CACHE: configured cache ram value is 256MiB, using 66MiB category=corecache level=warn" category=corecache module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="StorageOS DirectFS v1 client (server v0.1 protocol v1.3) start category=clinit level=info" category=clinit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="StorageOS RDB plugin category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="=> dir: /var/lib/storageos/data category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="=> databases: 1 category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="=> nodes: 1 category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="module=\"storageos-rdbplugin\", release=\"1.0.0\", buildId=\"release/1.0.0-rc1_5\", scmId=\"15e59658ac5bccb957b74cb96b2cf44dc175f3b2\" level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:41Z" level=info msg="Initialising rdb databases category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:42Z" level=info msg="rdb peristing db at /var/lib/storageos/data/db1 category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:42Z" level=info msg="Ready category=rdbplginit level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:44Z" level=info msg="started watching volumes" module=watcher
time="2018-07-11T18:38:44Z" level=info msg="started watching nodes" module=watcher
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '3'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '0'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '1'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg="starting dataplane state controller worker '2'" module=statesync
time="2018-07-11T18:38:44Z" level=info msg=started action=watch category=client key=storageos/nodes module=store
time="2018-07-11T18:38:44Z" level=info msg=started action=watch category=client key=storageos/volumes module=store
time="2018-07-11T18:38:44Z" level=info msg="starting api servers" action=start category=http module=cp
time="2018-07-11T18:38:44Z" level=info msg="server running" module=command
time="2018-07-11T18:38:44Z" level=error msg="watch cancelled" action=watch category=client error="etcdserver: mvcc: required revision has been compacted" key=storageos/volumes module=store
time="2018-07-11T18:38:44Z" level=warning msg="volume watcher: got received error 'watch stopped'" module=watcher
time="2018-07-11T18:38:46Z" level=info msg="config change: will add server map entry [72389@172.31.29.31:5703] level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="configuration changes applied level=info" category=rdbplginit module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="[72389@172.31.29.31:5703]:: attempt to establish channel category=clconmon level=info" category=clconmon module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="[72389@172.31.29.31:5703]: connection established (fd=13) category=clconn level=info" category=clconn module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="void FsConfig::presentation_lun_notify_receiver(FsConfig::FsConfigStore *, ConfigStore::Event<uint32_t, FsConfigPresentation>): Failed to stat backing store file '/var/lib/storageos/volumes/bst-197928' category=config level=warn" category=config module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="void FsConfig::presentation_lun_notify_receiver(FsConfig::FsConfigStore *, ConfigStore::Event<uint32_t, FsConfigPresentation>): Failed to stat backing store file '/var/lib/storageos/volumes/bst-212792' category=config level=warn" category=config module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="virtual bool FsConfig::PresentationEventSemantics::Validate(event_type): Not adding pr_filename '2560c04f-995b-2ad1-bc03-edd9df29b57c' for volume 56980 - already exists forvolume 56980 category=fscfg level=warn" category=fscfg module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="validator 'device_validator' rejected Event{type CREATE} category=libcfg level=warn" category=libcfg module=supervisor
time="2018-07-11T18:38:46Z" level=error msg="filesystem client: presentation create failed" action=create error="<nil>" module=statesync reason="Create refused by validator" volume_uuid=2560c04f-995b-2ad1-bc03-edd9df29b57c
time="2018-07-11T18:38:46Z" level=info msg="virtual bool FsConfig::PresentationEventSemantics::Validate(event_type): Not adding pr_filename 'f536c52c-daf3-4d3f-ccba-2f4c1db4aea1' for volume 72389 - already exists forvolume 72389 category=fscfg level=warn" category=fscfg module=supervisor
time="2018-07-11T18:38:46Z" level=info msg="validator 'device_validator' rejected Event{type CREATE} category=libcfg level=warn" category=libcfg module=supervisor
time="2018-07-11T18:38:46Z" level=error msg="filesystem client: presentation create failed" action=create error="<nil>" module=statesync reason="Create refused by validator" volume_uuid=f536c52c-daf3-4d3f-ccba-2f4c1db4aea1
time="2018-07-11T18:38:47Z" level=info msg="Accepted connection from 172.31.29.31:65228 (fd=8) level=info" category=libcfg module=supervisor
time="2018-07-11T18:38:47Z" level=info msg="0x0000DE94@172.31.29.31:65228 connected level=info" category=libcfg module=supervisor
time="2018-07-11T19:04:36Z" level=error msg="got err while waiting for volume creation" action=create category=volume error="no available nodes for replica found, error: no node matched filtering constraints, filters: anti_affinity_filter" module=cp namespace=default volume=vol1
time="2018-07-11T19:04:58Z" level=error msg="got err while waiting for volume creation" action=create category=volume error="no node matched filtering constraints, filters: node_selector_filter" module=cp namespace=default volume=vol2
time="2018-07-11T19:05:33Z" level=error msg="got err while waiting for volume creation" action=create category=volume error="no node matched filtering constraints, filters: " module=cp namespace=default volume=vol3

PSQL Pod 日志：

kubectl describe pod postgres-6569dcb44d-fj8k8
Name:           postgres-6569dcb44d-fj8k8
Namespace:      default
Node:           ip-172-31-28-148/172.31.28.148
Start Time:     Wed, 11 Jul 2018 18:39:52 +0000
Labels:         pod-template-hash=2125876008
                workload.user.cattle.io/workloadselector=deployment-default-postgres
Annotations:    field.cattle.io/ports=[[{"containerPort":5432,"dnsName":"postgres-nodeport","kind":"NodePort","name":"5432tcp300321","protocol":"TCP","sourcePort":30040}]]
                field.cattle.io/publicEndpoints=[{"addresses":["172.31.13.77"],"port":30032,"protocol":"TCP","serviceName":"default:postgres-nodeport","allNodes":true}]
Status:         Pending
IP:
Controlled By:  ReplicaSet/postgres-6569dcb44d
Containers:
  postgres:
    Container ID:
    Image:          postgres
    Image ID:
    Port:           5432/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/postgresql/data from pgsql-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-r6kl9 (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  pgsql-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pgsql-data
    ReadOnly:   false
  default-token-r6kl9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-r6kl9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                 From                       Message
  ----     ------       ----                ----                       -------
  Warning  FailedMount  7m (x330 over 11h)  kubelet, ip-172-31-28-148  Unable to mount volumes for pod "postgres-6569dcb44d-fj8k8_default(cc9e5947-8539-11e8-abcc-0ad368ad109e)": timeout expired waiting for volumes toattach or mount for pod "default"/"postgres-6569dcb44d-fj8k8". list of unmounted volumes=[pgsql-data]. list of unattached volumes=[pgsql-data default-token-r6kl9]
  Warning  FailedMount  2m (x344 over 11h)  kubelet, ip-172-31-28-148  MountVolume.SetUp failed for volume "pvc-7a4ee536-851c-11e8-90d5-0ad368ad109e" : stat /var/lib/storageos/volumes/f536c52c-daf3-4d3f-ccba-2f4c1db4aea1: no such file or directory

> kubectl get sc
NAME      PROVISIONER
fast      kubernetes.io/storageos
>  kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                STORAGECLASS      REASON    AGE
pvc-4ac4fc7c-8471-11e8-90d5-0ad368ad109e   10Gi       RWO            Delete           Released   default/volx         nfs-provisioner             1d
pvc-7a4ee536-851c-11e8-90d5-0ad368ad109e   20Gi       RWO            Delete           Bound      default/pgsql-data   fast                        15h
pvc-bc643fc6-8519-11e8-90d5-0ad368ad109e   10Gi       RWO,RWX        Delete           Bound      default/sos3         fast                        15h
> kubectl get pvc
NAME         STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pgsql-data   Bound     pvc-7a4ee536-851c-11e8-90d5-0ad368ad109e   20Gi       RWO            fast           15h
sos3         Bound     pvc-bc643fc6-8519-11e8-90d5-0ad368ad109e   10Gi       RWO,RWX        fast           15h

更新 1：在为 k8 设置标志后，如下面的答案所指出的。

创建 pod 时我仍然遇到同样的错误。未在节点/主机 VM ( RHEL ) 上创建 pvc 目录。日志与上面的“kubectl describe pod”相同

运行 Storageos pod 的节点：

[root@ip-172-31-24-214 storageos]# pwd
/var/lib/storageos

[root@ip-172-31-24-214 storageos]# ls -l
total 0
drwxr-xr-x. 3 root root 17 Jul 13 10:07 data
drwxr-xr-x. 2 root root  6 Jul 13 10:07 filesystems
drwxr-xr-x. 3 root root 22 Jul 13 10:07 kv
drwxr--r--. 2 root root 27 Jul 13 10:06 logs
drwxr-x---. 2 root root 16 Jul 13 10:06 state
drwxr-xr-x. 2 root root  6 Jul 13 10:07 volumes

[root@ip-172-31-24-214 storageos]# cd volumes/
[root@ip-172-31-24-214 volumes]# ls -l
total 0

score 1 · Accepted Answer

看起来您的安装中未启用 MountPropagation。docker 和 Kubernetes（k8s 1.10 默认启用）都需要进行相应的设置。

如果您的所有节点的 /var/lib/storageos/volumes 中有任何文件和设备，您可以发布吗？

查看以下文档链接，了解 docker bit、第 2 点和第 3 点的 k8s 功能门。 https://docs.storageos.com/docs/platforms/kubernetes/install/

您必须更改 api-controller 清单以添加功能门标志。

Kubernetes 1.10 默认启用了该功能门。但是，我不知道 Rancher 是如何部署配置的。如果 kubelet 在系统中运行，您必须在 systemd kubelet.service 文件中添加功能门。如果它在容器中运行，则必须在其配置中添加具有共享挂载标志的卷 --volume=/var/lib/storageos:/var/lib/storageos:rshared。helm chart 具有变量 cluster.sharedDir，您可以为包含的 kubelet 定义该变量，它将使用 kubelet 卷目录来托管 StorageOS 的设备文件。

score 0 · Accepted Answer

Rancher，尤其是在使用 rke 时，通过 tha cluster.yml 文件部署其配置（有关更多详细信息，请参阅 RKE 文档）。

要添加该功能门，您必须以下列方式调整该 cluster.yml 并重新运行rke up --config cluster.yml：

# Necessary for password protected SSH Keys:
ssh_agent_auth: true

# Nodes Definitions
nodes:
  - address: 10.70.60.3  # hostname or IP to access nodes
    user: docker # root user (usually 'root')
    role: [controlplane,etcd,worker] # K8s roles for node
    ssh_key_path: ~/.ssh/id_rsa # path to PEM file
...

services:
  kube-apiserver:
    extra_args:
      feature-gates: "PersistentLocalVolumes=true,VolumeScheduling=true"
  kubelet:
    extra_args:
      feature-gates: "PersistentLocalVolumes=true,VolumeScheduling=true"

# Default versions
system_images:
    kubernetes: rancher/hyperkube:v1.10.5-rancher1
...

希望这可以帮助。

干杯，达米安

docker - Rancher、Kubernetes 和 StorageOS：持久存储、卷挂载问题？

2 回答 2

Related

Reference