Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-4191

Impact assesment for OCPBUGS-21668: ovnkube-master is in CrashloopBackOff state after upgrading cluster to OpenShift v4.13

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • None
    • False
    • ---
    • 0
    • 0

      We're asking the following questions to evaluate whether or not OCPBUGS-21668 warrants changing update recommendations from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid recommending an update which introduces new risk or reduces cluster functionality in any way. In the absence of a declared update risk (the status quo), there is some risk that the existing fleet updates into the at-risk releases. Depending on the bug and estimated risk, leaving the update risk undeclared may be acceptable.

      Sample answers are provided to give more context and the ImpactStatementRequested label has been added to OCPBUGS-#. When responding, please move this ticket to Code Review. The expectation is that the assignee answers these questions.

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      Clusters which are using OCP on OVNKubernetes CNI at version 4.6.all and 4.7.all when they upgrade to >=4.13.all (https://github.com/openshift/ovn-kubernetes/blob/release-4.13/go-controller/pkg/ovn/policy.go#L351 ) they will hit that panic because these ACLs don't have names and the predicate can hit the nil condition. NOTE: If clusters had at least one db rebuild post 4.7, then they are safe and won't hit this panic because starting from >=4.8 versions the ACL had a name. This is probably why the bug was not hit by many.

      Which types of clusters?

      Clusters that were built <=4.7 release of OCP using OVNKubernetes when upgrading to >=4.13 will hit this issue if no OVN DB rebuilds were done.

      What is the impact? Is it serious enough to warrant removing update recommendations?

      ovnkube-master pods will be in Crashloopbackoff which will effect adding new networking functionality like pod creation. Existing workloads will not be effected.

      How involved is remediation?

      Needs manual intervention: https://access.redhat.com/solutions/7041369 

      Is this a regression?

      Yes, from 4.6.all or 4.7.all to 4.13.all

            sseethar Surya Seetharaman
            lmohanty@redhat.com Lalatendu Mohanty
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: