Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-5048

Impact Pods stuck terminating prevent ovn rollout

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • None
    • False
    • ---
    • 0
    • 0

      Impact statement for the OCPBUGS-34890 series:

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      Any 4.13 to 4.14 update

      Which types of clusters?

      Clusters using OVN with sufficiently memory-saturated nodes

      What is the impact? Is it serious enough to warrant removing update recommendations?

      Workload creation and deletion disrupted on affected nodes. Observed as Pods wedged in Terminating state on impacted nodes; scheduler tries to evict these pods to free up memory for new OVN pods (that increased their memory requests between 4.13 and 4.14) but this never finishes because there is no working CNI in the node.

      How involved is remediation?

      Forcefully delete pods on affected nodes to free memory and allow OVN pods to come up

      Is this a regression?

      OVN pods memory requests was increased intentionally in 4.14. I would not call it a regression but it is an operational matter that admins OVN clusters with memory-saturated nodes should be aware of.

       

              npinaeva@redhat.com Nadia Pinaeva
              trking W. Trevor King
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: