Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-4013

Impact: ovnkube node pod crashed after converting to a dual-stack cluster network

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • OVN Kubernetes
    • None
    • False
    • None
    • False
    • ---
    • 0
    • 0

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      Any -> 4.12.(1,2,3,4)

      Which types of clusters?

      Any OVN cluster that was during its lifetime migrated to dualstack (but not ones that were installed as such)

      If there is a cluster that has 2 ClusterNetworks and 2 ServiceNetworks configured but the Node CR has only 1 InternalIP in its Status.Addresses field, this is such a cluster. In pre-4.13 times the only way to end up with such a cluster is to install as single-stack and convert to dual-stack somewhere along the way.

      What is the impact? Is it serious enough to warrant removing update recommendations?

      OVN-K8s will be crash-looping. If you start debugging manually and reboot the node in such a state, you may not be able to schedule any Pods at all afterwards, they will be stuck in "ContainerCreating" status.

      How involved is remediation?

      There exists quite a heavy workaround which is to disable MCO, modify kubelet systemd unit definition manually, restart the system, upgrade to fixed version, revert modification of kubelet and then enable MCO; we did it once with Verizon as they escalated the upgrade to 4.12.(1,2,3,4) issue but it takes really huge engineering effort to manually fix such a cluster. bnemec@redhat.com can shed more light if needed.

      There is also https://access.redhat.com/solutions/7014904 which describes the problem (but does not provide the workaround)

      Is this a regression?

      Yes

            mkowalsk@redhat.com Mat Kowalski
            afri@afri.cz Petr Muller
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: