Loading...

XML

Word

Printable

Type: Bug
Resolution: Can't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.16.z
Component/s: kube-apiserver
Labels:
- rits-work

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:


The kube-apiserver operator is stuck for more than 48h

  - lastTransitionTime: "2025-04-26T08:32:37Z"
    message: 'EncryptionMigrationControllerProgressing: migrating resources to a new
      write key: [core/configmaps core/secrets]'
    reason: EncryptionMigrationController_Migrating
    status: "True"
    type: Progressing

There is no indication that it still progresses.

Action done:
- we tried to refresh the kube-apisever-operator pod by scaling it down and up - did not help
- we tried to start new revision of the kube-apiserver to see if perhaps there is some old state that isn't cleared - did not help

We see that from the kube-apiserver operator pod, there is throttling on the requests/responses.

Cluster seems fine from the etcd point, but from the Kube API server it seems that some requests are taking more time than it should.

There were earlier issues reported for the similar problems:
- https://access.redhat.com/solutions/6515171 - however there is not errors or failing webhooks, although there are additional apiservices that error 503 when request sent from kube-apiserver
- https://access.redhat.com/solutions/7062880 - issue with time sync - waiting for confirmation if the issue disappeared after chrony restart 

However, both issues were opened in older versions, hence opening a new bug.

Version-Release number of selected component (if applicable):

OCP 4.16.24

How reproducible:

n/a - not reproducible locally - persistent on customer cluster

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Assignee:: Jan Chaloupka

Reporter:: Vladislav Walek

Need Info From:: None

Contributors:: None

QA Contact:: Ke Wang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/04/28 12:27 AM

Updated:: 2025/09/24 8:29 PM

Resolved:: 2025/09/09 12:28 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide