[OCPBUGS-49749] Readiness probes must not rely on etcd - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18.0, 4.19.0
Component/s: kube-apiserver
Labels:
None

Regression:
None
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
The readiness probes for the API server (/readyz endpoint) have been modified to exclude etcd checks. This prevents client connections from being closed if etcd is temporarily unavailable. The assumption is that etcd will become ready again before a client connection times out, allowing client connections to persist through brief etcd unavailability and minimizing temporary API server outages.

Show
The readiness probes for the API server (/readyz endpoint) have been modified to exclude etcd checks. This prevents client connections from being closed if etcd is temporarily unavailable. The assumption is that etcd will become ready again before a client connection times out, allowing client connections to persist through brief etcd unavailability and minimizing temporary API server outages.
Release Note Type:
Enhancement
Release Note Status:
In Progress
Target Version:

4.18.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue OCPBUGS-48177. The following is the description of the original issue:
—
Description of problem:

Requests allow up to 30s for etcd to respond. Readiness probes only allow 9s for etcd to respond. When etcd latency is between 10-30s, standard requests will succeed, but due to the readiness probe configuration we lose every apiserver endpoint at the same time. This requires correction in the pod definitions and the load balancers. Making the ongoing readiness check `readyz?exclude=etcd` should correct the issue.

Off the top of my head this will include

kube-apiserver operator
authentication operator
openshift-apiserver operator
MCO apiserver-watch
metal LB
https://github.com/multi-arch/ocp-remote-ci/pull/39
where LBs are defined for aws, azure, and gcp

This is a low cost, low risk, high benefit change.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

clones

OCPBUGS-48177 Readiness probes must not rely on etcd

Verified

is blocked by

OCPBUGS-48177 Readiness probes must not rely on etcd

Verified

links to

openshift/cluster-authentication-operator#754: OCPBUGS-49749: Exclude etcd readiness checks from /readyz to ignore temporary etcd hiccups

openshift/cluster-openshift-apiserver-operator#613: [release-4.18] OCPBUGS-49749: Exclude etcd readiness checks from /readyz to ignore temporary etcd hiccups

openshift/kubernetes#2193: [release-4.18] OCPBUGS-49749: UPSTREAM: <carry>: disable etcd readiness checks by default

Assignee:: Jan Chaloupka

Reporter:: OpenShift Prow Bot

QA Contact:: Ke Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/02/03 9:07 AM

Updated:: 2025/02/26 9:59 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide