Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.14, 4.15, 4.16, 4.17, 4.18
Component/s: Networking / On-Prem Load Balancer
Labels:
None

Regression:
None
Release Blocker:
Approved
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Release Note Not Required
Release Note Status:
Done
Target Version:

4.18.0
Target Backport Versions:

4.14.z, 4.15.z, 4.17.z, 4.16.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

As part of TRT investigations of k8s API disruptions, we have discovered there are times when haproxy considers underlying apiserver as Down, yet from k8s perspective the apiserver is healthy&functional.

From the customer perspective, during this time any call to the cluster API endpoint will fail. It simply looks like an outage.

Thorough investigation leads us to the following difference in how haproxy perceives apiserver being alive versus how k8s perceives it, i.e.

inter 1s fall 2 rise 3

and

     readinessProbe:
      httpGet:
        scheme: HTTPS
        port: 6443
        path: readyz
      initialDelaySeconds: 0
      periodSeconds: 5
      timeoutSeconds: 10
      successThreshold: 1
      failureThreshold: 3

We can see the top check is much stricter. And it belongs to haproxy. As a result, haproxy sees the following

2024-10-08T12:37:32.779247039Z [WARNING]  (29) : Server masters/master-2 is DOWN, reason: Layer7 wrong status, code: 500, info: "Internal Server Error", check duration: 5ms. 0 active and 0 backup servers left. 154 sessions active, 0 requeued, 0 remaining in queue.

much faster than k8s would consider something as wrong.

In order to remediate this issue, it has been agreed the haproxy checks should be softened and adjusted to the k8s readiness probe.

blocks

OCPBUGS-43719 Haproxy timeouts not aligned with k8s healthiness checks

Closed

is cloned by

OCPBUGS-43719 Haproxy timeouts not aligned with k8s healthiness checks

Closed

is depended on by

OCPBUGS-43627 Prevent Drift From Expected Readiness Probe Configuration

is duplicated by

OCPBUGS-43446 On Prem LB Configuration Doesn't Match API Server Expectation

Closed

links to

openshift/machine-config-operator#4646: OCPBUGS-43428: Soften haproxy timeout for kubeapi probe

RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update

(1 links to)

Assignee:: Mat Kowalski

Reporter:: Mat Kowalski

QA Contact:: Ross Brattain

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/10/16 7:47 AM

Updated:: 2025/02/25 4:47 AM

Resolved:: 2025/02/25 4:47 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates