Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59745

Performance Profile changes are not reflecting on the nodes after an MCP update on ARM Grace Hopper

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • aarch64
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The issue is that while attempting to modify the performance profile, an MCP (Machine Config Pool) update is triggered. Upon completion of this update, the corresponding nodes should reflect the changes made in the performance profile. However, it appears that the fields targeted for modification are not taking effect on the nodes. Specifically, kernelPageSize was altered from 64k to 4k, and hugepagesize from 512M to 64K. The problem is that even after the MCP finished updating, the nodes themselves did not reflect these changes.

      Version-Release number of selected component (if applicable):

      4.20

      How reproducible:

      Sometimes

      Steps to Reproduce:

          1. Apply a performance profile
          2. Change any field in the performance profile
          3. Check inside the nodes if the changes took place
          

      Actual results:

          

      Expected results:

          

      Additional info:
      In these logs: The kernelPageSize and hugepagesize values defined in the performance profile (4K and 64K respectively) are not being applied to the nodes. Instead, the nodes are retaining their previous values (64K and 512M), indicating that the intended changes haven't taken effect.

      
      [kni@registry 14_arm]$ oc get no,mcp; echo; oc get pods -n node-inspector-ns; echo; oc get performanceprofile -o json | jq ".items[0].spec.kernelPageSize"; echo; oc get performanceprofile -o json | jq ".items[0].spec.hugepages.pages[0].size"
      NAME                                                     STATUS   ROLES                  AGE     VERSION
      node/master-0.kni-qe-93.telcoqe.eng.rdu2.dc.redhat.com   Ready    control-plane,master   4d19h   v1.33.2
      node/master-1.kni-qe-93.telcoqe.eng.rdu2.dc.redhat.com   Ready    control-plane,master   4d19h   v1.33.2
      node/master-2.kni-qe-93.telcoqe.eng.rdu2.dc.redhat.com   Ready    control-plane,master   4d19h   v1.33.2
      node/worker-0.kni-qe-93.telcoqe.eng.rdu2.dc.redhat.com   Ready    worker                 4d19h   v1.33.2
      node/worker-1.kni-qe-93.telcoqe.eng.rdu2.dc.redhat.com   Ready    worker                 4d19h   v1.33.2
      
      NAME                                                         CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      machineconfigpool.machineconfiguration.openshift.io/master   rendered-master-00e123d0ca70c700276a6474fe2e2a6e   True      False      False      3              3                   3                     0                      4d19h
      machineconfigpool.machineconfiguration.openshift.io/worker   rendered-worker-3f1f8d886973f4a5ee702fb6b9ee2698   True      False      False      2              2                   2                     0                      4d19h
      
      NAME                   READY   STATUS    RESTARTS   AGE
      node-inspector-bljkq   1/1     Running   0          87m
      node-inspector-ktsbv   1/1     Running   4          87m
      node-inspector-ns9cz   1/1     Running   0          87m
      node-inspector-svbc4   1/1     Running   4          87m
      node-inspector-t64wf   1/1     Running   0          87m
      
      "4k"
      
      "64k"
      [kni@registry 14_arm]$ oc debug node/worker-0.kni-qe-93.telcoqe.eng.rdu2.dc.redhat.com
      Starting pod/worker-0kni-qe-93telcoqeengrdu2dcredhatcom-debug-897wz ...
      To use host binaries, run `chroot /host`. Instead, if you need to access host namespaces, run `nsenter -a -t 1`.
      Pod IP: 10.6.159.12
      If you don't see a command prompt, try pressing enter.
      sh-5.1# getconf PAGESIZE
      65536
      sh-5.1# grep Hugepagesize: /proc/meminfo
      Hugepagesize:     524288 kB
      sh-5.1#
      exit
      
      Removing debug pod ...
      [kni@registry 14_arm]$ oc debug node/worker-1.kni-qe-93.telcoqe.eng.rdu2.dc.redhat.com
      Starting pod/worker-1kni-qe-93telcoqeengrdu2dcredhatcom-debug-nxnrs ...
      To use host binaries, run `chroot /host`. Instead, if you need to access host namespaces, run `nsenter -a -t 1`.
      Pod IP: 10.6.159.13
      If you don't see a command prompt, try pressing enter.
      sh-5.1# chroot /host
      sh-5.1# getconf PAGESIZE
      65536
      sh-5.1# grep Hugepagesize: /proc/meminfo
      Hugepagesize:     524288 kB

              msivak@redhat.com Martin Sivak
              rh-ee-rshemtov Roy Shemtov
              None
              None
              Roy Shemtov Roy Shemtov
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: