Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32472

cpuset changes after kubelet service restarts

XMLWordPrintable

    • -
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: Kubelet restart without a node restart.

      Impact:

      The kubelet service is configured to delete some state file on restart. That is needed for a node reboot scenario when all workloads are wiped out by the node restart.

      However when the kubelet service is restarted manually, the workloads are still present on the node and the deleted state files force kubelet to re-invent cpu assignments.

      Fix: Kubelet state files are only deleted on node reboot via a separate systemd service.

      * Previously, when manually restarting the `kubelet` service on a node, some state files were deleted after an assumed node reboot, which led to kubelet resetting the CPU Manager state. After the state reset, the CPU Manager computes new CPU assignments to running workloads. As a result, the new and initial `cpuset` configurration might differ. With this update, the `cpuset` configuration is correctly restored after a kubelet restart. (link:https://issues.redhat.com/browse/OCPBUGS-32472[*OCPBUGS-32472*])
      Show
      Cause: Kubelet restart without a node restart. Impact: The kubelet service is configured to delete some state file on restart. That is needed for a node reboot scenario when all workloads are wiped out by the node restart. However when the kubelet service is restarted manually, the workloads are still present on the node and the deleted state files force kubelet to re-invent cpu assignments. Fix: Kubelet state files are only deleted on node reboot via a separate systemd service. * Previously, when manually restarting the `kubelet` service on a node, some state files were deleted after an assumed node reboot, which led to kubelet resetting the CPU Manager state. After the state reset, the CPU Manager computes new CPU assignments to running workloads. As a result, the new and initial `cpuset` configurration might differ. With this update, the `cpuset` configuration is correctly restored after a kubelet restart. (link: https://issues.redhat.com/browse/OCPBUGS-32472 [* OCPBUGS-32472 *])
    • Bug Fix
    • Done
    • Hide
      2024-06-19: MCO tests failing mysteriously, investigating with the CI team
      2024-04-17: Waiting for process labels
      2024-04-10: All dependencies were merged, the final is patch waiting for CI
      Show
      2024-06-19: MCO tests failing mysteriously, investigating with the CI team 2024-04-17: Waiting for process labels 2024-04-10: All dependencies were merged, the final is patch waiting for CI

      This is a clone of issue OCPBUGS-28545. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-24366. The following is the description of the original issue:

      Description of problem:

      cpuset changes after kubelet service restarts.
      
      
      

      Version-Release number of selected component (if applicable):

          4.12.27

      How reproducible:

          100% in customer's cluster but 0% in my test lab

      Steps to Reproduce:

      1. Confirm cpu_manager_state before restarting kubelet service
      
      # cat /var/lib/kubelet/cpu_manager_state |jq .
      {
        "policyName": "static",
        "defaultCpuSet": "0-2,18,20-24,29-42,55,58,60-64,69-79",
        "entries": {
          "203009c3-760c-40d5-8a0d-39d28ec69bd7": {
            "cnt1": "7-10,47-50"
          },
          "22863b0b-ec2b-4bf6-8212-8c2cedccb74e": {
            "cnt2": "11-14,51-54"
          },
          "328f7651-326c-4fae-8655-458d8fa56db0": {
            "cnt3": "15"
          },
          "3ee17d9a-df62-4d13-83f8-dc0acde8483a": {
            "cnt4": "25-28,65-68"
          },
          "5e727985-c22e-49f5-9b14-2f811b59179e": {
            "cnt5": "16-17,56-57"
          },
          "9e081c8f-83c5-4b1f-92af-fee259d3040c": {
            "cnt6": "19,59"
          },
          "acb03746-ab64-477d-8099-352b1a54204f": {
            "cnt7": "3-6,43-46"
          }
        },
        "checksum": 550225907
      }
      
      2. Enable -v=4 in kubelet and restart the kubelet service
      
      # cat /etc/systemd/system/kubelet.service.d/20-logging.conf
      [Service]
      Environment="KUBELET_LOG_LEVEL=4"
      # sudo systemctl daemon-reload
      # systemctl restart kubelet
      # cat /var/lib/kubelet/cpu_manager_state
      
      3. Check cpu_manager_state after kubelet restarts
      
      $ cat /var/lib/kubelet/cpu_manager_state |jq .
      {
        "policyName": "static",
        "defaultCpuSet": "0-2,19-22,27-42,56,59-62,67-79",
        "entries": {
          "203009c3-760c-40d5-8a0d-39d28ec69bd7": {
            "cnt1": "3-6,43-46"
          },
          "22863b0b-ec2b-4bf6-8212-8c2cedccb74e": {
            "cnt2": "7-10,47-50"
          },
          "328f7651-326c-4fae-8655-458d8fa56db0": {
            "cnt3": "16"
          },
          "3ee17d9a-df62-4d13-83f8-dc0acde8483a": {
            "cnt4": "23-26,63-66"
          },
          "5e727985-c22e-49f5-9b14-2f811b59179e": {
            "cnt5": "17-18,57-58"
          },
          "9e081c8f-83c5-4b1f-92af-fee259d3040c": {
            "cnt6": "15,55"
          },
          "acb03746-ab64-477d-8099-352b1a54204f": {
            "cnt7": "11-14,51-54"
          }
        },
        "checksum": 3825494095
      }
          

      Actual results:

      The cpuset changes for each VNF container/Pod

      Expected results:

          

      Additional info:

          

              msivak@redhat.com Martin Sivak
              openshift-crt-jira-prow OpenShift Prow Bot
              Mallapadi Niranjan Mallapadi Niranjan
              Alexandra Molnar Alexandra Molnar
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: