Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-4147

Impact Rollout of ovnk pods is taking more time

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • None
    • False
    • ---
    • 0
    • 0

      We're asking the following questions to evaluate whether or not OCPBUGS-17391 warrants changing update recommendations from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid recommending an update which introduces new risk or reduces cluster functionality in any way. In the absence of a declared update risk (the status quo), there is some risk that the existing fleet updates into the at-risk releases. Depending on the bug and estimated risk, leaving the update risk undeclared may be acceptable.

      Sample answers are provided to give more context and the ImpactStatementRequested label has been added to OCPBUGS-17391. When responding, please move this ticket to Code Review. The expectation is that the assignee answers these questions.

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      Updates out of 4.13.9 and later (where 4.13.9+'s CRI-O has new stopping logic, OCPBUGS-15868, OCPBUGS-17150), until updating into releases with the
      fix (where the incoming network operator will work smoothly with the new CRI-O as the ovnkube-master DaemonSet updates).
      So 4.13.8 -> 4.13.9 wouldn't be vulnerable (outgoing CRI-O predates the stopping pivot). 4.13.9 -> 4.13.10 would be vulnerable (outgoing CRI-O has the
      pivot, incoming network operator has not been patched). 4.13.9 -> 4.13.fixed would not be vulnerable (incoming network operator has been patched).

      But this regression source is speculative, and has not been confirmed by extensive testing.

      Which types of clusters?

      OVN, because SDN doesn't use `ovn-ctl` stop hooks.

      What is the impact? Is it serious enough to warrant removing update recommendations?

      The update can take an additional ~3m per control-plane node to update the ovnkube-master DaemonSet. No functional impact other than the time that
      rollout takes; the cluster will be completely healthy and workloads will not be impacted.

      How involved is remediation?

      Because of the limited impact, remediation has not been investigated.

      Is this a regression?

      It is assumed so, in 4.13.9. But we have not confirmed that through testing, because of the limited impact.

            jluhrsen Jamo Luhrsen
            trking W. Trevor King
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: