-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
8
-
None
-
None
-
uShift Sprint 246, uShift Sprint 247, uShift Sprint 248
-
None
-
None
-
None
Description of problem:
Greenboot restarts fail in standard test suites Some tests make frequent microshift restarts, which takes down the apiserver (among other components) but not the pods. The topolvm-controller pod is using leader election with hardcoded parameters. These are too short to withstand a microshift restart (15s), so one container in the pod goes down 15s after the apiserver is offline. When the controller restarts it tries to reach apiserver, which sometimes takes too long, and enters a crash loop. The backoff algorithm kicks in and caps at 5min, doubling every restart. Greenboot is only waiting for 5min, sometimes the backoff has a bad offset with greenboot and it will signal unhealthy. Eventually, the controller would recover by itself, once everything is stable and the container is restarted after the backoff.
Version-Release number of selected component (if applicable):
main
How reproducible:
Occasionally in CI
Steps to Reproduce:
1. CI
Additional info:
- blocks
-
USHIFT-2254 Greenboot restarts fail in Network smoke tests
-
- Closed
-
- is cloned by
-
USHIFT-2254 Greenboot restarts fail in Network smoke tests
-
- Closed
-
- links to