0

我正在我的 pod 上运行一个 cpu-hog 实验,并看到它失败的 Fail Step: failed in chaos injection 阶段。没有看到任何关于其失败原因的日志。感谢任何帮助。实验、服务帐户和结果文件似乎创建得很好,但是,结果表明它失败了。当工作(跑步者)正在进行时,我无法捕捉日志。

参考:我正在使用的 Cpu-hog 实验 yamls 在这里

k logs litmus-8548bd-skvbt -n litmus

{"level":"info","ts":1607551992.9267251,"logger":"controller_chaosengine","msg":"Reconciling ChaosEngine","Request.Namespace":"sbs-svs","Request.Name":"sbs-abc-server-cpu-hog-chaos"}
{"level":"info","ts":1607551993.3839076,"logger":"controller_chaosengine","msg":"Reconciling ChaosEngine","Request.Namespace":"sbs-svs","Request.Name":"sbs-abc-server-cpu-hog-chaos"}
{"level":"info","ts":1607551993.4021606,"logger":"controller_chaosengine","msg":"Reconciling ChaosEngine","Request.Namespace":"sbs-svs","Request.Name":"sbs-abc-server-cpu-hog-chaos"}

k describe chaosresult sbs-abc-server-cpu-hog-chaos-pod-cpu-hog

Name:         sbs-abc-server-cpu-hog-chaos-pod-cpu-hog
Namespace:    sbs-svs
Labels:       app.kubernetes.io/component=experiment-job
              app.kubernetes.io/part-of=litmus
              app.kubernetes.io/version=1.9.1
              chaosUID=c36498b4-16f8-4b2f-93ca-601d5c72bb56
              controller-uid=8a7be18b-8eef-4190-afda-2d24cef0fcbf
              job-name=pod-cpu-hog-7iq6o6
              name=sbs-abc-server-cpu-hog-chaos-pod-cpu-hog
Annotations:  <none>
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosResult
Metadata:
  Creation Timestamp:  2020-12-09T19:36:46Z
  Generation:          2
  Managed Fields:
    API Version:  litmuschaos.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:app.kubernetes.io/component:
          f:app.kubernetes.io/part-of:
          f:app.kubernetes.io/version:
          f:chaosUID:
          f:controller-uid:
          f:job-name:
          f:name:
      f:spec:
        .:
        f:engine:
        f:experiment:
      f:status:
        .:
        f:experimentstatus:
          .:
          f:failStep:
          f:phase:
          f:verdict:
    Manager:         experiments
    Operation:       Update
    Time:            2020-12-09T19:37:50Z
  Resource Version:  32768765
  Self Link:         /apis/litmuschaos.io/v1alpha1/namespaces/sbs-svs/chaosresults/sbs-abc-server-cpu-hog-chaos-pod-cpu-hog
  UID:               38b0ad59-e153-4d6a-a099-ee3dad2579df
Spec:
  Engine:      sbs-abc-server-cpu-hog-chaos
  Experiment:  pod-cpu-hog
Status:
  Experimentstatus:
    Fail Step:  failed in chaos injection phase
    Phase:      Completed
    Verdict:    Fail
Events:         <none>
4

1 回答 1

1

对于我拥有的分发版,Kill container 命令无法正常工作。以下命令对我有用。更新引擎 yaml 中的 env 变量

- name: CHAOS_KILL_COMMAND
  value: "kill $(find /proc -name exe -lname '*/md5sum' 2>&1 | grep -v 'Permission denied' | awk -F/ '{print $(NF-1)}' |  head -n 1)"
于 2020-12-11T13:25:47.830 回答