Loading...

XML

Word

Printable

Type: Spike
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- UpgradeBlocker

Blocked:
False
Blocked Reason:
None
Ready:
False
[QE] How to address?:
---
Intelligence Requested:
Market:

Cost of Delay:
0
WSJF:
0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Which 4.y.z to 4.y'.z' updates increase vulnerability?

We identified this problem when upgrading cluster with IPsec enabled from 4.14 to 4.15, But this problem may exist on a fresh cluster when we enable IPsec on 4.14 or even on the previous versions.

Which types of clusters?

This problem is currently found only on the vSphere cluster in which we have bond interface is used as an primary interface which has IPsec hardware offload enabled.

What is the impact? Is it serious enough to warrant removing update recommendations?

This would make east west traffic for the cluster is entirely broken and cluster becomes unusable until we disable IPsec from networks operator config.

How involved is remediation?

If the user still wants to have IPsec enabled for cluster, they can still restore the cluster with disabling esp-tx-csum-hw-offload on bond and its slave interface.
For example, this can be done by rolling out the following machine configs.

for role in master worker; do
cat >> "${SHARED_DIR}/manifest_${role}-esp-csum-disable.yml" <<-EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: $role
  name: 80-$role-esp-csum-disable
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
      - name: disable.esp.csum.service
        enabled: true
        contents: |
         [Unit]
         Description=Disable ESP csum hw offload
         After=ovs-configuration.service
         Before=kubelet-dependencies.target node-valid-hostname.service
         [Service]
         Type=oneshot
         ExecStart=/usr/bin/bash -x -c "ethtool -K bond0 esp-tx-csum-hw-offload off && ethtool -K ens192 esp-tx-csum-hw-offload off && ethtool -K ens224 esp-tx-csum-hw-offload off"
         StandardOutput=journal+console
         StandardError=journal+console
         [Install]
         WantedBy=network-online.target
EOF
done

This procedure needs cluster reboot, so it takes few minutes to restore the cluster.

Is this a regression?

I don't see a statement anywhere which states OCP supports IPsec hardware offload, hence this may not be a regression issue.

blocks

OCPBUGS-25312 [OVN][IPSEC EW]Upgrade from 4.14->4.15 failed for Vsphere

Closed

is duplicated by

SDN-4482 Impact statement request for OCPBUGS-22185 [OVN IPsec]One master node cannot access the pod on one worker node

Closed

Assignee:: Periyasamy Palanisamy

Reporter:: Lalatendu Mohanty

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/02/19 10:18 PM

Updated:: 2024/03/06 1:51 PM

Resolved:: 2024/03/06 1:51 PM

Details

Description

Which 4.y.z to 4.y'.z' updates increase vulnerability?

Which types of clusters?

What is the impact? Is it serious enough to warrant removing update recommendations?

How involved is remediation?

Is this a regression?

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates