Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17
Component/s: Networking / On-Prem Load Balancer
Labels:
None

Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Release Note Not Required
Release Note Status:
Done
Target Version:

4.18.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

In discussion of https://issues.redhat.com/browse/OCPBUGS-37862 it was noticed that sometimes the haproxy-monitor is reporting "API is not reachable through HAProxy" which means it is removing the firewall rule to direct traffic to HAProxy. This is not ideal since it means keepalived will likely fail over the VIP and it may be breaking existing connections to HAProxy.

There are a few possible reasons for this. One is that we only require two failures of the healthcheck in the monitor to trigger this removal. For something we don't expect to need to happen often during normal operation of a cluster, this is probably a bit too harsh, especially since we only check every 6 seconds so it's not like we're looking for quick error detection. This is more a bootstrapping thing and a last ditch effort to keep the API functional if something has gone terribly wrong in the cluster. If it takes a few more seconds to detect an outage that's better than detecting outages that aren't actually outages.

The first thing we're going to try to fix this is to increase what amounts to the "fall" value for the monitor check. If that doesn't eliminate the problem we will have to look deeper at the HAProxy behavior during node reboots.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

links to

openshift/baremetal-runtimecfg#332: OCPBUGS-38877: Increase "fall" value for haproxy-monitor check

RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update

Assignee:: Benjamin Nemec

Reporter:: Benjamin Nemec

QA Contact:: Zhanqi Zhao

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/08/22 10:04 PM

Updated:: 2025/02/25 4:41 AM

Resolved:: 2025/02/25 4:41 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates