-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.17
-
None
-
None
-
False
-
Description of problem:
In discussion of https://issues.redhat.com/browse/OCPBUGS-37862 it was noticed that sometimes the haproxy-monitor is reporting "API is not reachable through HAProxy" which means it is removing the firewall rule to direct traffic to HAProxy. This is not ideal since it means keepalived will likely fail over the VIP and it may be breaking existing connections to HAProxy.
There are a few possible reasons for this. One is that we only require two failures of the healthcheck in the monitor to trigger this removal. For something we don't expect to need to happen often during normal operation of a cluster, this is probably a bit too harsh, especially since we only check every 6 seconds so it's not like we're looking for quick error detection. This is more a bootstrapping thing and a last ditch effort to keep the API functional if something has gone terribly wrong in the cluster. If it takes a few more seconds to detect an outage that's better than detecting outages that aren't actually outages.
The first thing we're going to try to fix this is to increase what amounts to the "fall" value for the monitor check. If that doesn't eliminate the problem we will have to look deeper at the HAProxy behavior during node reboots.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
- links to
-
RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update