Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-13163

[4.13] cgroupv1 support for cpu balancing is broken for non-SNO nodes

    XMLWordPrintable

Details

    • Critical
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      The cpu-balance: disable annotation associated with the low latency tuning (https://docs.openshift.com/container-platform/4.12/scalability_and_performance/cnf-low-latency-tuning.html#node-tuning-operator-disabling-cpu-load-balancing-for-dpdk_cnf-master) does not work on systems that do not have Workload partitioning configured.

      More specifically, on clusters where the Infrastructure does not set the CPUPinning to the AllNodes value (https://github.com/openshift/openshift-docs/blob/cd4a1c3108977ebc7452f4d9dfd4e8881b8fa9ae/scalability_and_performance/enabling-workload-partitioning.adoc#L26)

      This affects the achievable latency of such clusters and might prevent proper operation of low latency workloads.
      Show
      The cpu-balance: disable annotation associated with the low latency tuning ( https://docs.openshift.com/container-platform/4.12/scalability_and_performance/cnf-low-latency-tuning.html#node-tuning-operator-disabling-cpu-load-balancing-for-dpdk_cnf-master ) does not work on systems that do not have Workload partitioning configured. More specifically, on clusters where the Infrastructure does not set the CPUPinning to the AllNodes value ( https://github.com/openshift/openshift-docs/blob/cd4a1c3108977ebc7452f4d9dfd4e8881b8fa9ae/scalability_and_performance/enabling-workload-partitioning.adoc#L26 ) This affects the achievable latency of such clusters and might prevent proper operation of low latency workloads.
    • Known Issue
    • Hide
      7/18: All the pieces were merged to both 4.14 and 4.13.z. The status of the other bugs is verified.
      7/4: finalizing 4.14 submission via OCPBUGS-13980, 4.13.z backport to follow
      Show
      7/18: All the pieces were merged to both 4.14 and 4.13.z. The status of the other bugs is verified. 7/4: finalizing 4.14 submission via OCPBUGS-13980, 4.13.z backport to follow

    Description

      This is a clone of issue OCPBUGS-13148. The following is the description of the original issue:

      Description of problem:

      Deployment of a standard masters+workers cluster using 4.13.0-rc.6 does not configure the cgroup structure according to OCPNODE-1539

      Version-Release number of selected component (if applicable):

      OCP 4.13.0-rc.6

      How reproducible:

      Always

      Steps to Reproduce:

      1. Deploy the cluster
      2. Check for presence of /sys/fs/cgroup/cpuset/system*
      3. Check the status of cpu balancing of the root cpuset cgroup (should be disabled)
      

      Actual results:

      No system cpuset exists and all services are still present in the root cgroup with cpu balancing enabled.

      Expected results:

       

      Additional info:

      The code has a bug we missed. It is nested under the Workload partitioning check on line https://github.com/haircommander/cluster-node-tuning-operator/blob/123e26df30c66fd5c9836726bd3e4791dfd82309/pkg/performanceprofile/controller/performanceprofile/components/machineconfig/machineconfig.go#L251

      Attachments

        Issue Links

          Activity

            People

              msivak@redhat.com Martin Sivak
              openshift-crt-jira-prow OpenShift Prow Bot
              Mallapadi Niranjan Mallapadi Niranjan
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: