Loading...

XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Critical
Fix Version/s: openshift-4.16
Affects Version/s: openshift-4.16
Labels:
- no--qe

Epic Name:
Automatic recovery from expired server and peer certs
Epic Status:
To Do
Activity Type:
Product / Portfolio Work
Parent Link:
OCPSTRAT-1103[etcd] recovery from expired etcd server and peer certs
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Size:
None

Target Version:
None
Release Blocker:
None

Epic Goal*

Provide a way to automatically recover a cluster with expired etcd server and peer certs

Why is this important? (mandatory)

Currently, the EtcdCertSigner controller, which is part of the CEO, renews the aforementioned certificates roughly every 3 years. However, if the cluster is offline for a period longer than the certificate's validity, upon restarting the cluster, the controller won't be able to renew the certificates since the operator won't be running at all.

We have scenarios where the customer, partner, or service delivery needs to recover a cluster that is offline, suspended, or shutdown, and as part of the process requires a supported way to force certificate and key rotation or replacement.

See the following doc for more use cases of when such clusters need to be recovered:
https://docs.google.com/document/d/198C4xwi5td_V-yS6w-VtwJtudHONq0tbEmjknfccyR0/edit

Required to enable emergency certificate rotation.
https://issues.redhat.com/browse/API-1613
https://issues.redhat.com/browse/API-1603

Scenarios (mandatory)

A cluster has etcd serving, peer and serving-metrics certificates that are expired. There should be a way to either trigger certificate rotation or have a process that automatically does the rotation.
This does not cover the expiration of etcd-signer certificates at this time.
That will be covered under https://issues.redhat.com/browse/ETCD-445

Dependencies (internal and external) (mandatory)

While the etcd team will implement the automatic recovery for the etcd certificates, other control-plane operators will be handling their own certificate recovery.

Contributing Teams(and contacts) (mandatory)

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

Development - etcd team
Documentation - etcd docs team
QE - etcd qe
PX -
Others -

Acceptance Criteria (optional)

When a openshift etcd cluster that has expired etcd server and peer certs is restarted and is able to regenerate those certs.

Drawbacks or Risk (optional)

Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

CI Testing - Having an e2e test that puts a cluster into the expired certs failure mode and forces it to recover.
Documentation - Docs that explain the cert recovery procedure
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be “Release Pending”

is related to

OCPSTRAT-1103 [etcd] recovery from expired etcd server and peer certs

Closed

Assignee:: Haseeb Tariq

Reporter:: Haseeb Tariq

Need Info From:: None

Contributors:: None

QA Contact:: Sandeep Kundu

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/01/03 4:31 AM

Updated:: 2025/06/27 9:51 AM

Resolved:: 2024/04/22 9:55 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates