Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44520

Failing to apply persistent sysctl changes to a vlan interface through the tuned profile

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.14.z
    • Node Tuning Operator
    • None
    • None
    • 3
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      NPSS customer on OCP 4.14 performing persistent sysctl changes to turn off rp_filter settings to specific network interfaces on the Worker nodes. 
      
      These changes are managed through a tuned profile ( is available as attachement support case ) . However, this profile ends in Degraded state after rebooting the nodes, so the changes are not persistent. 
      
      after applying the tuned profile and rebooting the worker node, we see below errors in tuned logs ( /var/log/pods/openshift-cluster-node-tuning-operator_tuned-XXXXXX/tuned/XX.log )
      2024-11-05 03:38:03,984 ERROR    tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.ipv4.conf.bond0/1233.rp_filter', the parameter does not exist
           
      tuned profile is degraded:
      $ oc get profile.tuned.openshift.io -n openshift-cluster-node-tuning-operator xxxx03.xxxx.tcxxxxnz.net NAME                                TUNED                        APPLIED   DEGRADED  AGE xxxx03.xxxx.tcxxxxnz.net   openshift-node-5g-user-aat   True      True       25m
      
      
      after few minutes of node reboot, when we reapply the tuned profile, the new tuned parameters are applied and profile is not degraded.
      $ oc delete profile.tuned.openshift.io -n openshift-cluster-node-tuning-operator xxxx03.xxxx.tcxxxxnz.net
      
      $ oc get profile.tuned.openshift.io -n openshift-cluster-node-tuning-operator xxxx03.xxxx.tcxxxx.net
       NAME                                TUNED                        APPLIED   DEGRADED   AGE xxxx03.xxxx.xxxx.net    openshift-node-5g-user-aat   True      False      11s
      
      So it looks like during a node reboot, a race condition between OVN recreating bond0 and the tuned pod, which is caused because we are creating VLAN 1233 on the same bond managed by OVN.
      
      the same tuned profile works fine when re-applied and works fine in our lab environment, so we are not suspecting any tuned config issue here
      
      
      we tried to add delays using machineconfig systemd, but what we found was, bond0.1233 from NNCP get activated after all systemd services are completed and with in few seconds later tuned profile is applied
      
      -------------
      journalctl showing when bond0.1233 activate
      -------------
      Nov 05 03:38:14 xxxx03.xxxx.xxxx.net NetworkManager[2954]: <info>  [1730777894.7943] manager: (bond0.1233): new VLAN device (/org/freedesktop/NetworkManager/Devices/78)
      Nov 05 03:38:14 mxxxx03.xxxx.xxxx.net NetworkManager[2954]: <info>  [1730777894.7946] audit: op="connection-update" uuid="30127951-2ff1-4ad3-b8b7-35bcfbfa1d9f" name="bond0.1233" args="connection.timestamp,vlan.parent" pid=12644 uid=0 result="success"
      Nov 05 03:38:14 xxxx03.xxxx.xxxx.net NetworkManager[2954]: <info>  [1730777894.8504] device (bond0.1233): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
      Nov 05 03:38:14 xxxx03.xxxx.xxxx.net NetworkManager[2954]: <info>  [1730777894.8510] device (bond0.1233): state change: unavailable -> disconnected (reason 'user-requested', sys-iface-state: 'managed')
      Nov 05 03:38:14 xxxx03.xxxx.xxxx.net NetworkManager[2954]: <info>  [1730777894.8516] device (bond0.1233): Activation: starting connection 'bond0.1233' (30127951-2ff1-4ad3-b8b7-35bcfbfa1d9f)
      
      -------------
      tuned logs showing when tuned is applied
      -------------
      2024-11-05 03:38:03,984 ERROR    tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.ipv4.conf.bond0/1233.rp_filter', the parameter does not exist
      2024-11-05 03:38:03,985 ERROR    tuned.plugins.plugin_sysctl: sysctl option net.ipv4.conf.bond0/1233.rp_filter will not be set, failed to read the original value.
      2024-11-05 03:38:03,984 ERROR    tuned.plugins.plugin_sysctl: Failed to read sysctl parameter 'net.ipv4.conf.bond0/1233.rp_filter', the parameter does not exist
      2024-11-05 03:38:03,985 ERROR    tuned.plugins.plugin_sysctl: sysctl option net.ipv4.conf.bond0/1233.rp_filter will not be set, failed to read the original value.
      
      require assistance to review the issue and suggetions on how to implement these rp_filter tuned parameters in this configration

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      The issue is reproducible in customer environment where they have baremetal and intel NIC cards
      
      This issue was not re-producible in our lab environment which is virtual environment
      
      

      Steps to Reproduce:

         issue is not re-producible

      Actual results:

          

      Expected results:

      Tuned/sysctl parameters are applied as per tuned profile

      Additional info:

      we have customer support case 03960421 in SFDC for this issue and logs are available in this support case and also on supportshell server
      
      must-gather logs https://access.redhat.com/support/cases/#/case/03960421/discussion?attachmentId=a09Hn00000YZAaMIAX
      
      
      sosreport from node
      https://access.redhat.com/support/cases/#/case/03960421/discussion?attachmentId=a09Hn00000YZAYuIAP
      
      
      tuned profile
      https://access.redhat.com/support/cases/#/case/03960421/discussion?attachmentId=a096R00003NupcbQAB
      
      

       

              jmencak Jiri Mencak
              rhn-support-mkampli Manjunatha Kampli
              Liquan Cui Liquan Cui
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: