Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-4501

Impact assesment for OCPBUGS-25312 : Upgrade from 4.14->4.15 failed for Vsphere

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • None
    • False
    • ---
    • 0
    • 0

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      We identified this problem when upgrading cluster with IPsec enabled from 4.14 to 4.15, But this problem may exist on a fresh cluster when we enable IPsec on 4.14 or even on the previous versions.

      Which types of clusters?

      This problem is currently found only on the vSphere cluster in which we have bond interface is used as an primary interface which has IPsec hardware offload enabled.

      What is the impact? Is it serious enough to warrant removing update recommendations?

      This would make east west traffic for the cluster is entirely broken and cluster becomes unusable until we disable IPsec from networks operator config.

      How involved is remediation?

      If the user still wants to have IPsec enabled for cluster, they can still restore the cluster with disabling esp-tx-csum-hw-offload on bond and its slave interface.
      For example, this can be done by rolling out the following machine configs.

      for role in master worker; do
      cat >> "${SHARED_DIR}/manifest_${role}-esp-csum-disable.yml" <<-EOF
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: $role
        name: 80-$role-esp-csum-disable
      spec:
        config:
          ignition:
            version: 3.2.0
          systemd:
            units:
            - name: disable.esp.csum.service
              enabled: true
              contents: |
               [Unit]
               Description=Disable ESP csum hw offload
               After=ovs-configuration.service
               Before=kubelet-dependencies.target node-valid-hostname.service
               [Service]
               Type=oneshot
               ExecStart=/usr/bin/bash -x -c "ethtool -K bond0 esp-tx-csum-hw-offload off && ethtool -K ens192 esp-tx-csum-hw-offload off && ethtool -K ens224 esp-tx-csum-hw-offload off"
               StandardOutput=journal+console
               StandardError=journal+console
               [Install]
               WantedBy=network-online.target
      EOF
      done 

      This procedure needs cluster reboot, so it takes few minutes to restore the cluster.

      Is this a regression?

      I don't see a statement anywhere which states OCP supports IPsec hardware offload, hence this may not be a regression issue.

              pepalani@redhat.com Periyasamy Palanisamy
              lmohanty@redhat.com Lalatendu Mohanty
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: