-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.14.z
-
None
-
None
-
3
-
False
-
-
Description of problem:
NPSS customer on OCP 4.14 performing persistent sysctl changes to turn off rp_filter settings to specific network interfaces on the Worker nodes. These changes are managed through a tuned profile ( is available as attachement support case ) . However, this profile ends in Degraded state after rebooting the nodes, so the changes are not persistent. after applying the tuned profile and rebooting the worker node, we see below errors in tuned logs ( /var/log/pods/openshift-cluster-node-tuning-operator_tuned-XXXXXX/tuned/XX.log ) 2024-11-05 03:38:03,984 ERROR tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.ipv4.conf.bond0/1233.rp_filter', the parameter does not exist tuned profile is degraded: $ oc get profile.tuned.openshift.io -n openshift-cluster-node-tuning-operator xxxx03.xxxx.tcxxxxnz.net NAME TUNED APPLIED DEGRADED AGE xxxx03.xxxx.tcxxxxnz.net openshift-node-5g-user-aat True True 25m after few minutes of node reboot, when we reapply the tuned profile, the new tuned parameters are applied and profile is not degraded. $ oc delete profile.tuned.openshift.io -n openshift-cluster-node-tuning-operator xxxx03.xxxx.tcxxxxnz.net $ oc get profile.tuned.openshift.io -n openshift-cluster-node-tuning-operator xxxx03.xxxx.tcxxxx.net NAME TUNED APPLIED DEGRADED AGE xxxx03.xxxx.xxxx.net openshift-node-5g-user-aat True False 11s So it looks like during a node reboot, a race condition between OVN recreating bond0 and the tuned pod, which is caused because we are creating VLAN 1233 on the same bond managed by OVN. the same tuned profile works fine when re-applied and works fine in our lab environment, so we are not suspecting any tuned config issue here we tried to add delays using machineconfig systemd, but what we found was, bond0.1233 from NNCP get activated after all systemd services are completed and with in few seconds later tuned profile is applied ------------- journalctl showing when bond0.1233 activate ------------- Nov 05 03:38:14 xxxx03.xxxx.xxxx.net NetworkManager[2954]: <info> [1730777894.7943] manager: (bond0.1233): new VLAN device (/org/freedesktop/NetworkManager/Devices/78) Nov 05 03:38:14 mxxxx03.xxxx.xxxx.net NetworkManager[2954]: <info> [1730777894.7946] audit: op="connection-update" uuid="30127951-2ff1-4ad3-b8b7-35bcfbfa1d9f" name="bond0.1233" args="connection.timestamp,vlan.parent" pid=12644 uid=0 result="success" Nov 05 03:38:14 xxxx03.xxxx.xxxx.net NetworkManager[2954]: <info> [1730777894.8504] device (bond0.1233): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Nov 05 03:38:14 xxxx03.xxxx.xxxx.net NetworkManager[2954]: <info> [1730777894.8510] device (bond0.1233): state change: unavailable -> disconnected (reason 'user-requested', sys-iface-state: 'managed') Nov 05 03:38:14 xxxx03.xxxx.xxxx.net NetworkManager[2954]: <info> [1730777894.8516] device (bond0.1233): Activation: starting connection 'bond0.1233' (30127951-2ff1-4ad3-b8b7-35bcfbfa1d9f) ------------- tuned logs showing when tuned is applied ------------- 2024-11-05 03:38:03,984 ERROR tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.ipv4.conf.bond0/1233.rp_filter', the parameter does not exist 2024-11-05 03:38:03,985 ERROR tuned.plugins.plugin_sysctl: sysctl option net.ipv4.conf.bond0/1233.rp_filter will not be set, failed to read the original value. 2024-11-05 03:38:03,984 ERROR tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.ipv4.conf.bond0/1233.rp_filter', the parameter does not exist 2024-11-05 03:38:03,985 ERROR tuned.plugins.plugin_sysctl: sysctl option net.ipv4.conf.bond0/1233.rp_filter will not be set, failed to read the original value. require assistance to review the issue and suggetions on how to implement these rp_filter tuned parameters in this configration
Version-Release number of selected component (if applicable):
How reproducible:
The issue is reproducible in customer environment where they have baremetal and intel NIC cards This issue was not re-producible in our lab environment which is virtual environment
Steps to Reproduce:
issue is not re-producible
Actual results:
Expected results:
Tuned/sysctl parameters are applied as per tuned profile
Additional info:
we have customer support case 03960421 in SFDC for this issue and logs are available in this support case and also on supportshell server must-gather logs https://access.redhat.com/support/cases/#/case/03960421/discussion?attachmentId=a09Hn00000YZAaMIAX sosreport from node https://access.redhat.com/support/cases/#/case/03960421/discussion?attachmentId=a09Hn00000YZAYuIAP tuned profile https://access.redhat.com/support/cases/#/case/03960421/discussion?attachmentId=a096R00003NupcbQAB