Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.19.z
Affects Version/s: 4.19.z, 4.20
Component/s: MicroShift
Labels:
- microshift-no-backport

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.19.z
Release Blocker:
None
Sprint:
uShift Sprint 274
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
In Progress
Release Note Type:
Bug Fix
Release Note Text:

Hide
*Cause*: MicroShift host is shutdown for more than 10 minutes.
*Consequence*: Upon start, the healthcheck could erroneously fail because of faulty Deployment progression logic
*Fix*: Faulty logic was removed
*Result*: Bug doesn’t present anymore.

Show
*Cause*: MicroShift host is shutdown for more than 10 minutes. *Consequence*: Upon start, the healthcheck could erroneously fail because of faulty Deployment progression logic *Fix*: Faulty logic was removed *Result*: Bug doesn’t present anymore.

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-59175~~. The following is the description of the original issue:
—
Description of problem:

The problem was spotted in the footprint-and-performance nightly jobs.
Sometimes, the healthcheck would fail with "exceeded its progress deadline" for some Deployments immediately after starting the check.

---------
01:40:08.768240    2931 workloads.go:95] Waiting 10m0s for deployment/service-ca in openshift-service-ca
01:40:08.781351    2931 workloads.go:130] Failed waiting for deployment/service-ca in openshift-service-ca: deployment "service-ca" exceeded its progress deadline01:40:08.771696    2931 workloads.go:95] Waiting 10m0s for deployment/csi-snapshot-controller in kube-system
01:40:08.781421    2931 workloads.go:130] Failed waiting for deployment/csi-snapshot-controller in kube-system: deployment "csi-snapshot-controller" exceeded its progress deadline01:40:08.768227    2931 workloads.go:95] Waiting 10m0s for deployment/router-default in openshift-ingress
01:40:08.781574    2931 workloads.go:130] Failed waiting for deployment/router-default in openshift-ingress: deployment "router-default" exceeded its progress deadline01:50:09.914921   11490 workloads.go:95] Waiting 10m0s for deployment/kserve-controller-manager in redhat-ods-applications
01:50:09.922380   11490 workloads.go:130] Failed waiting for deployment/kserve-controller-manager in redhat-ods-applications: deployment "kserve-controller-manager" exceeded its progress deadline
--------


That error occurs (and short-circuits the healthcheck) when Deployment's condition "Progressing" is false with reason "ProgressDeadlineExceeded".
These deployments have default value of "progressDeadlineSeconds" which is 600.
Reboot of the bare metal node in AWS takes around 15 minutes.

Deployment is getting that condition (Progressing: false +ProgressDeadlineExceeded) because the time between $now and last time the Progressing condition was updated was greater than the deadline (600s) due to the reboot.

The solution is to remove the short-circuit exit with that error - basically ignore that condition and give whole time that healthcheck was given to wait for the Deployment.

That condition is transient or accidental, shortly after the deployment is progressing again and at the time of collecting the SOS report (couple minutes later), the MicroShift is healthy.

Version-Release number of selected component (if applicable):

MicroShift 4.19 and 4.20

How reproducible:

Low

Steps to Reproduce:

1. Start fresh MicroShift
2. Give some time to create the Deployments, but probably not too long for MicroShift to become ready.
3. Shut down the machine for more than 10 minutes.
4. Start the machine
5. Watch the greenboot-healthcheck

Actual results:

greenboot-healthcheck fails because one of the microshift's healthchecks fails with error `deployment "service-ca" exceeded its progress deadline` almost immediately after starting the healthcheck

Expected results:

healthcheck runs as long as needed within specified timeout to assert the platform is ready

Additional info:

clones

OCPBUGS-59175 microshift healthcheck erroneously detects deployment progress timeout in certain conditions

Closed

is blocked by

OCPBUGS-59175 microshift healthcheck erroneously detects deployment progress timeout in certain conditions

Closed

links to

openshift/microshift#5194: [release-4.19] OCPBUGS-59301: Healthcheck: remove the Deployment's progress timed out check

RHBA-2025:12745 Red Hat build of MicroShift 4.19.7 bug fix and enhancement update

Assignee:: Patryk Matuszak

Reporter:: Patryk Matuszak

QA Contact:: Rama Kasturi Narra

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/07/14 3:51 PM

Updated:: 2025/08/11 6:39 PM

Resolved:: 2025/08/11 6:39 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates