Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59641

Regression in 4.20 nightly kubelet CPU

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • Yes
    • All
    • Dev
    • None
    • Approved
    • OCP Node Sprint 274 (green), OCP Node Sprint 275 (green)
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          Change point detected in kubelet CPU usage for 4.20. Indicates 30% increase. Also pods ready latency on the cluster got affected due to this by 50% increase.

      Version-Release number of selected component (if applicable):

          kubelet 1.33 seems to be causing this issue.

      How reproducible:

          It can be reproducible by retriggering this prow job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946068601798660096 

      Steps to Reproduce:

      We have already done the reproducing on our end and confirmed that kubelet has the issue.
      
      For reproducing the issue, just retrigger this job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946068601798660096 
      
      To confirm if its the kubelet, please follow the below steps:
       1. Make a code changes to override RHCOS version without kubelet 1.33 which will fall back to a previous version 1.32.6. PR: https://github.com/openshift/release/commit/200b635867eed8971fe9df97ac5d5cd7b6b7f688 
      2. Trigger a rehearsal using the comment in github: /pj-rehearse periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes 
      3. Both the CPU usage and pod ready latency have restored to previous numbers and there wasn't any changepoint.
      
      ### Results for CPU usage after the patch
      | 17 | 87ae0d27-f58f-4ec9-bc37-515c9effaea8 | 2025-07-17T18:46:42Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67110/rehearse-67110-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945888384954142720 |       25.6033 | False            |              0      |
      | 22 | b3e23e9b-769d-43b9-aa77-b597c90ea2ca | 2025-07-19T06:37:45Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946416759300952064 |       20.7294 | False            |              0      |
      | 30 | 18348b1c-1f0d-48e6-a72d-f4d1bc6abc4e | 2025-07-21T14:35:58Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1947274119384928256 |       22.2043 | False            |              0      |
      | 31 | 85264ff0-353f-4715-b668-9e3bb8e89f7f | 2025-07-21T19:28:47Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1947347398476959744 |       22.8142 | False    
      
      ### Results for pod ready latency after the patch
      | 17 | 87ae0d27-f58f-4ec9-bc37-515c9effaea8 | 2025-07-17T18:46:42Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67110/rehearse-67110-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945888384954142720 |                  2000 | False            |              0      |
      | 22 | b3e23e9b-769d-43b9-aa77-b597c90ea2ca | 2025-07-19T06:37:45Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946416759300952064 |                  2000 | False            |              0      |
      | 30 | 18348b1c-1f0d-48e6-a72d-f4d1bc6abc4e | 2025-07-21T14:35:58Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1947274119384928256 |                  2000 | False            |              0      |
      | 31 | 85264ff0-353f-4715-b668-9e3bb8e89f7f | 2025-07-21T19:28:47Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1947347398476959744 |                  2000 | False 
      
      4. So kubelet 1.32.6 was fine but there is some change in kubelet 1.33 that is leading to increase in CPU usage and pod ready latency.

      Actual results:

      With kubelet 1.33 a changepoint was detected.
      
      ### For CPU usage
      
      control-plane-6nodes/1945472195727724544 |       26.0451 | False            |              0      |
      | 28 | 35dbe753-c8c6-4ecf-9070-83e45fede818 | 2025-07-17T07:01:03Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945711780562997248 |       34.879  | True             |             29.1073 | -- changepoint
      | 29 | 9c77713e-f649-482b-8889-025d2fb3337e | 2025-07-18T06:46:10Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946068601798660096 |       30.2378 | False            |              0      |
      +----+--------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------+---------------------+
      
      ### Pod ready latency
      
      | 27 | 597be9c6-4d1b-4724-9b87-8ee4968bbbc0 | 2025-07-16T15:24:36Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945472195727724544 |                  2000 | False            |                   0 |
      | 28 | 35dbe753-c8c6-4ecf-9070-83e45fede818 | 2025-07-17T07:01:03Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945711780562997248 |                  3000 | True             |                  50 | -- changepoint
      | 29 | 9c77713e-f649-482b-8889-025d2fb3337e | 2025-07-18T06:46:10Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946068601798660096 |                  3000 | False            |                   0 |
      +----+--------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+------------------+---------------------+

      Expected results:

      No changepoint to be detected.     

      Additional info:

      Attaching compressed pprof files with more details.

       

              pehunt@redhat.com Peter Hunt
              rh-ee-vchalla Vishnu Challa
              None
              Abu Kashem
              Min Li Min Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: