5

在我的集群中,我使用weave Flux和他们的 Flux-helm-operator 以 gitops 方式管理我的集群。

但是,当我在 Flux git 存储库中更改图表时,我经常遇到以下错误消息:

ts=2019-09-25T11:54:37.604506452Z caller=chartsync.go:328 component=chartsync
warning="unable to proceed with release" 
resource=mychart:helmrelease/mychart release=mychart
err="release requires a rollback before it can be upgraded (FAILED)"

我不确定它在 helm 中是什么意思,但无论如何,我不应该运行任何 helm 命令,因为版本是由 Flux 管理的,所以我想知道在生产中处理这个错误的正确方法是什么

(除了删除版本并等待通量重新创建它)

一个解释清楚的答案将被非常接受,谢谢。

4

1 回答 1

5

让我们深入研究代码helm-operator

警告unable to proceed with release出现后GetUpgradableRelease

    // GetUpgradableRelease returns a release if the current state of it
    // allows an upgrade, a descriptive error if it is not allowed, or
    // nil if the release does not exist.

release requires a rollback before it can be upgraded如果发布有状态,则返回错误Status_FAILED(参见release.go#89

UNHEALTHY状态块释放

正如flux开发人员在 #2265 中提到的那样,没有办法滚动到UNHEALTHY状态。

这不是错误,但我可以看到您的期望来自哪里。

Flux 只会将健康的版本向前推进,这样做的原因之一是确保我们不会陷入失败循环,--force因此该标志不打算用于强制升级不健康的资源(您应该使用回滚功能),但开发它可以升级图表,例如向后不兼容的更改(例如对不可变字段的更改,这需要首先删除资源,请参阅#1760)。

结论:forceUpgrade荣幸,但不能用于强制升级某个版本的UNHEALTHY状态。

回滚

正如作者建议的那样,您应该使用rollback功能

有时,Helm 操作员发布的版本可能会失败,可以通过.spec.rollback.enable在 HelmRelease 资源上设置为 true 来自动回滚失败的版本。

Note: a successful rollback of a Helm chart containing a StatefulSet resource is known to be tricky, and one of the main reasons automated rollbacks are not enabled by default for all HelmReleases. Verify a manual rollback of your Helm chart does not cause any problems before enabling it.

启用后,Helm 操作员将检测到错误的升级并执行回滚,除非它检测到值和/或图表的变化,否则它不会尝试新的升级。

apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
# metadata: ...
spec:
  # Listed values are the defaults.
  rollback:
    # If set, will perform rollbacks for this release.
    enable: false
    # If set, will force resource update through delete/recreate if
    # needed.
    force: false
    # Prevent hooks from running during rollback.
    disableHooks: false
    # Time in seconds to wait for any individual Kubernetes operation.
    timeout: 300
    # If set, will wait until all Pods, PVCs, Services, and minimum
    # number of Pods of a Deployment are in a ready state before
    # marking the release as successful. It will wait for as long
    # as the set timeout.
    wait: false
于 2019-09-25T14:29:10.887 回答