Uploaded image for project: 'OpenShift Etcd'
  1. OpenShift Etcd
  2. ETCD-511

Impact of cluster-backup failed in upgrade from 4.14.7 to 4.15.0-rc.0

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • 1
    • False
    • None
    • False
    • ETCD Sprint 247

      We're asking the following questions to evaluate whether or not OCPBUGS-26214 warrants changing update recommendations from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid recommending an update which introduces new risk or reduces cluster functionality in any way. In the absence of a declared update risk (the status quo), there is some risk that the existing fleet updates into the at-risk releases. Depending on the bug and estimated risk, leaving the update risk undeclared may be acceptable.

      Sample answers are provided to give more context and the ImpactStatementRequested label has been added to OCPBUGS-26214. When responding, please move this ticket to Code Review. The expectation is that the assignee answers these questions.

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      Customers that want to upgrade from 4.14.3 until 4.14.9 (exclusive) are not able to upgrade to 4.15.x.

      Which types of clusters?

      I've verified this exists in GCP (IPI), reports from others indicate vsphere (IPI). I'm verifying more platforms as we speak.

      Verified:

      • GCP (IPI)
      • Vsphere (IPI)
      • AWS (IPI)
      • ...

      What is the impact? Is it serious enough to warrant removing update recommendations?

      Customers are unable to upgrade to 4.15.x. The cluster-etcd-operator will not set a condition required to proceed the upgrade.

      How involved is remediation?

      Fix is already in and about to release with the next z-stream update 4.14.9. A possible remediation would be to manually add the required controller condition to the "etcd" CRD.

      You can do this with a simple OC status patch:

      oc patch ClusterOperator/etcd --subresource status --type='merge' -p "$(cat <<- EOF
      status:
        conditions:
        - type: RecentBackup
          reason: "manual override"
          message: "nothing to see here"
          lastTransitionTime: '2024-01-11T12:00:00Z'
          status: "True"
      EOF
      )"
      
      

      Is this a regression?

      No, this (removal of old backups) is new functionality that was introduced in an attempt to remove the upgrade-backup controller with 4.15.

            tjungblu@redhat.com Thomas Jungblut
            afri@afri.cz Petr Muller
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: