Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.17.0, 4.17.z, 4.16.z, 4.18.z
Component/s: apiserver-auth
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

Customer Impact:

Customer Escalated, Customer Facing, Customer Reported

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    Automatic certificate rotation can lead to cluster inaccessibility when a master node is temporarily unavailable during the rotation process. If one or more master nodes are in a "Not Ready" state and recovery is delayed, the certificate rotation mechanism fails to complete successfully across all masters. 
The certificate rotation failure eventually leads to expired certificates, preventing authentication and rendering the entire cluster inaccessible.

Version-Release number of selected component (if applicable):

How reproducible:

    Always during cert rotation process

Steps to Reproduce:

    1. Induce a failure on one master node, causing it to go into a "Not Ready" state (e.g., stopping essential services, network isolation).

    2. Allow the automatic certificate rotation process to initiate while this master node is down.

    3. Observe the failure of the certificate rotation to complete successfully due to the unavailable master.

Actual results:

    The automatic certificate rotation fails when not all master nodes are in a "Ready" state. Consequently, certificates are not renewed, leading to their expiration and rendering the cluster inaccessible due to authentication failures.

Expected results:

    Given that OpenShift clusters are designed to tolerate the loss of one master node (2 out of 3 masters being "Ready" is sufficient for cluster operations), the automatic certificate rotation process should exhibit similar resiliency.
Specifically, it is expected that:
- If a master node is in a "Not Ready" state during the rotation, the rotation process should temporarily skip this node.
- Upon the "Not Ready" node returning to a "Ready" state (e.g., after recovery and subsequent certificate approval processes like CSR), the certificate rotation should then be applied to that specific node to bring it up to date.

Additional info:

    This behavior would ensure that the cluster remains accessible and highly available even during transient master node disruptions while maintaining up-to-date certificates across all healthy masters.

links to

Users cannot login to Openshift 4 with error "Login failed (401 Unauthorized)" while a control plane node is failing

Assignee:: Unassigned

Reporter:: Sameer Sardar

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/06/30 8:59 AM

Updated:: 2025/09/24 10:22 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide