Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: openshift-4.16
Affects Version/s: None
Component/s: None
Labels:
- microshift-ci-urgent

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
8
Severity:
None

Target Version:

openshift-4.16
Release Blocker:
None
Sprint:
uShift Sprint 246, uShift Sprint 247, uShift Sprint 248

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Description of problem:

Greenboot restarts fail in standard test suites
Some tests make frequent microshift restarts, which takes down the apiserver (among other components) but not the pods. The topolvm-controller pod is using leader election with hardcoded parameters. These are too short to withstand a microshift restart (15s), so one container in the pod goes down 15s after the apiserver is offline.
When the controller restarts it tries to reach apiserver, which sometimes takes too long, and enters a crash loop. The backoff algorithm kicks in and caps at 5min, doubling every restart.
Greenboot is only waiting for 5min, sometimes the backoff has a bad offset with greenboot and it will signal unhealthy.
Eventually, the controller would recover by itself, once everything is stable and the container is restarted after the backoff.

Version-Release number of selected component (if applicable):

main

How reproducible:

Occasionally in CI

Steps to Reproduce:

1. CI

Additional info:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-microshift-main-ocp-metal-nightly-arm/1737457876722520064/artifacts/ocp-metal-nightly-arm/openshift-microshift-e2e-metal-tests/artifacts/scenario-info/el93-src@standard-suite/log.html

blocks

USHIFT-2254 Greenboot restarts fail in Network smoke tests

Closed

is cloned by

USHIFT-2254 Greenboot restarts fail in Network smoke tests

Closed

links to

openshift/microshift#2867: USHIFT-2105: Recreate core components on microshift restarts

openshift/microshift#2878: USHIFT-2105: Temporarily disable flaky tests

openshift/microshift#2891: USHIFT-2105: cert rotation test wait for greenboot

Assignee:: Pablo Acevedo Montserrat

Reporter:: Gregory Giguashvili

Need Info From:: None

Contributors:: None

QA Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2023/12/20 3:43 PM

Updated:: 2025/07/02 1:22 PM

Resolved:: 2024/02/02 2:03 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates