Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-1257

Impact statement request for OCPBUGS-37534 4.12 -> 4.13 upgrade using IPI on Azure does not work

XMLWordPrintable

    • False
    • None
    • False
    • MCO Sprint 257
    • 0
    • 0

      Impact statement for OCPBUGS-38295.

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      Updates into 4.13.46 until OCPBUGS-38295 lands a fix.

      • 4.13.46's OCPBUGS-37160 made that release tricky for Azure clusters born in 4.(y<12) and 4.12.(z<54).
      • 4.12.54's OCPBUGS-30823 protects clusters born in 4.12.(z>=54) from this issue.
      • 4.14.32's OCPBUGS-36356 (recently picked back to 4.13.z with OCPBUGS-38295) protects even born-in-old clusters from the issue.

      That leaves 4.13.46 as the only exposed release.

      Which types of clusters?

      Azure clusters born in 4.(y<12) and 4.12.(z<54).

      What is the impact? Is it serious enough to warrant removing update recommendations?

      As the machine-config operator reboots nodes into the 4.13.46 configuration, systemd will detect a dependency loop among units and disable a unit. The disabled unit will cause CRO-O and the kubelet to fail to run, and the node will never return to Ready=True healthiness unless the cluster admin can SSH in or use the serial console it to recover the systemd units.

      How involved is remediation?

      Updating to a release with OCPBUGS-38295 will avoid the issue for other nodes, but SSH recovery or replacement may be the only options for already-impacted nodes.

      Is this a regression?

      Yes, see the which-updates answer above for the multi-patch exposure story.

            jerzhang@redhat.com Yu Qi Zhang
            dhurta@redhat.com David Hurta
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: