Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22409

PerformanceProfile with isolated cpu fails to be applied

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • 4.15.0
    • Node Tuning Operator
    • None
    • Important
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • 20/11 : main issue should be solved by the OCP blocking issue (node not ready) . Profile degradation will probably have to wait for the RHEL blocking bug attached here.

      Description of problem:

      When deploying CNF workers where a PerformanceProfile is applied, the profile can't be applied and the worker node is never ready.

      Version-Release number of selected component (if applicable):

      current master (4.15)

      How reproducible:

      apiVersion: v1
      items:
      - apiVersion: performance.openshift.io/v2
        kind: PerformanceProfile
        metadata:
          creationTimestamp: "2023-10-25T18:48:39Z"
          finalizers:
          - foreground-deletion
          generation: 1
          name: cnf-performanceprofile
          resourceVersion: "32192"
          uid: 93d463ac-6412-4a9a-ac81-5a4dba4c9730
        spec:
          additionalKernelArgs:
          - nmi_watchdog=0
          - audit=0
          - mce=off
          - processor.max_cstate=1
          - idle=poll
          - intel_idle.max_cstate=0
          - amd_iommu=on
          cpu:
            isolated: 2-7
            reserved: 0-1
          globallyDisableIrqLoadBalancing: true
          hugepages:
            defaultHugepagesSize: 1G
            pages:
            - count: 4
              node: 0
              size: 1G
          nodeSelector:
            node-role.kubernetes.io/worker: ""
          realTimeKernel:
            enabled: false
      

      Steps to Reproduce:

      1. Deploy a cluster with 3 masters and zero worker.
      2. Create a PerformanceProfile as above.
      3. Scale the workers to 1
      

      Actual results:

      The worker node will never be ready, cluster-node-tuning-operator fails to apply the PerformanceProfile with this error:
      
      I1025 18:56:02.899138       1 controller.go:820] created MachineConfig 50-nto-worker with kernel parameters: [skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-7 tuned.non_isolcpus=00000003 systemd.cpu_affinity=0,1 intel_iommu=on iommu=pt isolcpus=managed_irq,2-7 nohz_full=2-7 tsc=reliable nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 default_hugepagesz=1G nmi_watchdog=0 audit=0 mce=off processor.max_cstate=1 idle=poll intel_idle.max_cstate=0 amd_iommu=on intel_pstate=disable]
      I1025 18:56:02.900768       1 status.go:306] 1/4 Profiles failed to be applied
      
      
      
      crio fails with this error:
      
      level=error msg="Container creation error: time=\"2023-10-25T19:10:06Z\" level=error msg=\"runc create failed: unable to start container process: unable to apply cgroup con
      figuration: failed to write \\\"0-7\\\": write /sys/fs/cgroup/cpuset/system.slice/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podef062c82f679a1ef42be9dd35e115e2f.slice/crio-c8300b86016fe34637684039adf9eef3d4f04a411c5d13ba4c464c91589551ee.scope/cpuset.cpus: pe
      rmission denied\"\n" id=44a49756-5a89-4c16-acea-56961a74f1ab name=/runtime.v1.RuntimeService/CreateContainer

      Expected results:

      The PerformanceProfile should be applied (like before, it worked) and the Worker node to be Ready.

      Additional info:

      Example of CI job: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_sriov-network-operator/844/pull-ci-openshift-sriov-network-operator-master-e2e-openstack-nfv/1717172583696175104

            yquinn@redhat.com Yanir Quinn
            emacchi@redhat.com Emilien Macchi
            Mallapadi Niranjan Mallapadi Niranjan
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: