Uploaded image for project: 'OpenShift Core Networking'
  1. OpenShift Core Networking
  2. CORENET-4908

Impact: Unexpected Behavior During Cluster Upgrade (4.14.23 to 4.15.15) for the ovn-ipsec-host pods

    • False
    • None
    • False
    • ---
    • 0

      This related bug to this card has not been fixed yet as we found a broader exposure in CORENET-5630.

      Impact statement for the OCPBUGS-37205 series:

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      All 4.14 to 4.15 until we have a fixed 4.15

      Which types of clusters?

      Clusters that use OVN networking with IPsec enabled, where more than one MachineConfigPool (MCP) is configured for worker nodes and at least one of the MCP is paused. Default cluster configuration with a single MCP for master nodes and single for worker nodes is not affected.

      What is the impact? Is it serious enough to warrant removing update recommendations?

      Worker nodes in the paused MachineConfigPool lose SDN connectivity, which affects workloads.

      How involved is remediation?

      The paused MCP can be unpaused which resolves the issue

      Is this a regression?

      Yes.

            [CORENET-4908] Impact: Unexpected Behavior During Cluster Upgrade (4.14.23 to 4.15.15) for the ovn-ipsec-host pods

            graph-data#5588 merged, declaring this issue as OVNIPsecPausedMCPConnectivity for 4.14 to 4.15 updates. I'm closing this ticket with the completed impact statement and risk declaration.

            The OCPBUGS-37205 series remains open tracking an actual product fix. Watch that ticket and its backports to see when the fixes go out.

            W. Trevor King added a comment - graph-data#5588 merged, declaring this issue as OVNIPsecPausedMCPConnectivity for 4.14 to 4.15 updates. I'm closing this ticket with the completed impact statement and risk declaration. The OCPBUGS-37205 series remains open tracking an actual product fix. Watch that ticket and its backports to see when the fixes go out.

            graph-data#5588 merged, declaring this issue as OVNIPsecPausedMCPConnectivity for 4.14 to 4.15 updates. I'm closing this ticket with the completed impact statement and risk declaration.

            The OCPBUGS-37205 series remains open tracking an actual product fix. Watch that ticket and its backports to see when the fixes go out.

            W. Trevor King added a comment - graph-data#5588 merged, declaring this issue as OVNIPsecPausedMCPConnectivity for 4.14 to 4.15 updates. I'm closing this ticket with the completed impact statement and risk declaration. The OCPBUGS-37205 series remains open tracking an actual product fix. Watch that ticket and its backports to see when the fixes go out.

            Started to work on edge blocks in https://github.com/openshift/cincinnati-graph-data/pull/5588, PromQL likely needs testing and review

            Petr Muller added a comment - Started to work on edge blocks in https://github.com/openshift/cincinnati-graph-data/pull/5588 , PromQL likely needs testing and review

            1. no
            2. I'm not sure. need to test this, but I think - no
            3. no. I tested this, it works. though the upgrade process is blocked until all the nodes are updated
            4. yes.

            > are both OVN and SDN networking clusters affected?
            no because ipsec is for OVN only

            Yuval Kashtan added a comment - 1. no 2. I'm not sure. need to test this, but I think - no 3. no. I tested this, it works. though the upgrade process is blocked until all the nodes are updated 4. yes. > are both OVN and SDN networking clusters affected? no because ipsec is for OVN only

            Also, are both OVN and SDN networking clusters affected?

            Petr Muller added a comment - Also, are both OVN and SDN networking clusters affected?

            I'm confused about the "have more than one worker MCP" answer to "which typoes of clusters", and pausedness not mentioned there. So to clarify:

            Will the cluster run into the issue:
            1. If a cluster has a single worker MCP and it is not paused before the update started? (I understand no)
            2. If a cluster has a single worker MCP and it is paused?
            3. If a cluster has multiple worker MCPs and none are paused?
            4. If a cluster has multiple worker MCPs and at least one is paused? (I understand yes)

            Petr Muller added a comment - I'm confused about the "have more than one worker MCP" answer to "which typoes of clusters", and pausedness not mentioned there. So to clarify: Will the cluster run into the issue: 1. If a cluster has a single worker MCP and it is not paused before the update started? (I understand no) 2. If a cluster has a single worker MCP and it is paused? 3. If a cluster has multiple worker MCPs and none are paused? 4. If a cluster has multiple worker MCPs and at least one is paused? (I understand yes)

            I belive OCPBUGS-36688 is a duplicate of the older OCPBUGS-32525.
            it affect upgrade of any cluster with ipsec enabled that have multiple worker MCPs and wants to upgrade with paused MCPs

            > Which 4.y.z to 4.y'.z' updates increase vulnerability?
            any <=4.14 to any 4.15 (until we'll have a fix)
            4.16 include a fix to this

            > Which types of clusters?

            • ipsec enabled
            • have more than one worker MCP

            > What is the impact? Is it serious enough to warrant removing update recommendations?
            the impact is that the paused MCPs will lose SDN connectivity
            impacting workloads.
            issue resolves itself only after unpausing the MCP
            and there is no known workaround

            > Is this a regression?
            yes.

            Yuval Kashtan added a comment - I belive OCPBUGS-36688 is a duplicate of the older OCPBUGS-32525 . it affect upgrade of any cluster with ipsec enabled that have multiple worker MCPs and wants to upgrade with paused MCPs > Which 4.y.z to 4.y'.z' updates increase vulnerability? any <=4.14 to any 4.15 (until we'll have a fix) 4.16 include a fix to this > Which types of clusters? ipsec enabled have more than one worker MCP > What is the impact? Is it serious enough to warrant removing update recommendations? the impact is that the paused MCPs will lose SDN connectivity impacting workloads. issue resolves itself only after unpausing the MCP and there is no known workaround > Is this a regression? yes.

              ykashtan Yuval Kashtan
              afri@afri.cz Petr Muller
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: