Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18.0, 4.19.0
Component/s: kube-apiserver
Labels:
- no-qe
- rits-work

Regression:
None
Release Blocker:
Approved
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
The readiness probes for the API server (/readyz endpoint) have been modified to exclude etcd checks. This prevents client connections from being closed if etcd is temporarily unavailable. The assumption is that etcd will become ready again before a client connection times out, allowing client connections to persist through brief etcd unavailability and minimizing temporary API server outages.

Show
The readiness probes for the API server (/readyz endpoint) have been modified to exclude etcd checks. This prevents client connections from being closed if etcd is temporarily unavailable. The assumption is that etcd will become ready again before a client connection times out, allowing client connections to persist through brief etcd unavailability and minimizing temporary API server outages.
Release Note Type:
Enhancement
Release Note Status:
In Progress
Target Version:

4.19.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Requests allow up to 30s for etcd to respond. Readiness probes only allow 9s for etcd to respond. When etcd latency is between 10-30s, standard requests will succeed, but due to the readiness probe configuration we lose every apiserver endpoint at the same time. This requires correction in the pod definitions and the load balancers. Making the ongoing readiness check `readyz?exclude=etcd` should correct the issue.

Off the top of my head this will include

kube-apiserver operator
authentication operator
openshift-apiserver operator
MCO apiserver-watch
metal LB
https://github.com/multi-arch/ocp-remote-ci/pull/39
where LBs are defined for aws, azure, and gcp

This is a low cost, low risk, high benefit change.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

blocks

OCPBUGS-49749 Readiness probes must not rely on etcd

POST

is cloned by

OCPBUGS-49749 Readiness probes must not rely on etcd

POST

links to

openshift/cluster-authentication-operator#753: OCPBUGS-48177: Exclude etcd readiness checks from /readyz to ignore temporary etcd hiccups

openshift/cluster-openshift-apiserver-operator#612: OCPBUGS-48177: Exclude etcd readiness checks from /readyz to ignore temporary etcd hiccups

openshift/kubernetes#2174: OCPBUGS-48177: UPSTREAM: <carry>: disable etcd readiness checks by default

RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update

(1 links to)

Assignee:: Jan Chaloupka

Reporter:: David Eads

QA Contact:: Ke Wang

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2025/01/08 3:58 PM

Updated:: 2025/02/17 6:13 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates