Description of problem:
Change point detected in kubelet CPU usage for 4.20. Indicates 30% increase. Also pods ready latency on the cluster got affected due to this by 50% increase.
Version-Release number of selected component (if applicable):
kubelet 1.33 seems to be causing this issue.
How reproducible:
It can be reproducible by retriggering this prow job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946068601798660096
Steps to Reproduce:
We have already done the reproducing on our end and confirmed that kubelet has the issue. For reproducing the issue, just retrigger this job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946068601798660096 To confirm if its the kubelet, please follow the below steps: 1. Make a code changes to override RHCOS version without kubelet 1.33 which will fall back to a previous version 1.32.6. PR: https://github.com/openshift/release/commit/200b635867eed8971fe9df97ac5d5cd7b6b7f688 2. Trigger a rehearsal using the comment in github: /pj-rehearse periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes 3. Both the CPU usage and pod ready latency have restored to previous numbers and there wasn't any changepoint. ### Results for CPU usage after the patch | 17 | 87ae0d27-f58f-4ec9-bc37-515c9effaea8 | 2025-07-17T18:46:42Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67110/rehearse-67110-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945888384954142720 | 25.6033 | False | 0 | | 22 | b3e23e9b-769d-43b9-aa77-b597c90ea2ca | 2025-07-19T06:37:45Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946416759300952064 | 20.7294 | False | 0 | | 30 | 18348b1c-1f0d-48e6-a72d-f4d1bc6abc4e | 2025-07-21T14:35:58Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1947274119384928256 | 22.2043 | False | 0 | | 31 | 85264ff0-353f-4715-b668-9e3bb8e89f7f | 2025-07-21T19:28:47Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1947347398476959744 | 22.8142 | False ### Results for pod ready latency after the patch | 17 | 87ae0d27-f58f-4ec9-bc37-515c9effaea8 | 2025-07-17T18:46:42Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67110/rehearse-67110-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945888384954142720 | 2000 | False | 0 | | 22 | b3e23e9b-769d-43b9-aa77-b597c90ea2ca | 2025-07-19T06:37:45Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946416759300952064 | 2000 | False | 0 | | 30 | 18348b1c-1f0d-48e6-a72d-f4d1bc6abc4e | 2025-07-21T14:35:58Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1947274119384928256 | 2000 | False | 0 | | 31 | 85264ff0-353f-4715-b668-9e3bb8e89f7f | 2025-07-21T19:28:47Z | https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/67237/rehearse-67237-periodic-ci-openshift-eng-ocp-qe-perfscale-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1947347398476959744 | 2000 | False 4. So kubelet 1.32.6 was fine but there is some change in kubelet 1.33 that is leading to increase in CPU usage and pod ready latency.
Actual results:
With kubelet 1.33 a changepoint was detected. ### For CPU usage control-plane-6nodes/1945472195727724544 | 26.0451 | False | 0 | | 28 | 35dbe753-c8c6-4ecf-9070-83e45fede818 | 2025-07-17T07:01:03Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945711780562997248 | 34.879 | True | 29.1073 | -- changepoint | 29 | 9c77713e-f649-482b-8889-025d2fb3337e | 2025-07-18T06:46:10Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946068601798660096 | 30.2378 | False | 0 | +----+--------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------+---------------------+ ### Pod ready latency | 27 | 597be9c6-4d1b-4724-9b87-8ee4968bbbc0 | 2025-07-16T15:24:36Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945472195727724544 | 2000 | False | 0 | | 28 | 35dbe753-c8c6-4ecf-9070-83e45fede818 | 2025-07-17T07:01:03Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1945711780562997248 | 3000 | True | 50 | -- changepoint | 29 | 9c77713e-f649-482b-8889-025d2fb3337e | 2025-07-18T06:46:10Z | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-eng-XXXXXX-XXXXXXXXX-ci-main-aws-4.20-nightly-x86-payload-control-plane-6nodes/1946068601798660096 | 3000 | False | 0 | +----+--------------------------------------+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+------------------+---------------------+
Expected results:
No changepoint to be detected.
Additional info:
Attaching compressed pprof files with more details.