Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: rhwa-25.8
Affects Version/s: None
Component/s: Node Healthcheck
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Target Version:

rhwa-25.8

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

During recent NHC end-to-end (e2e) test runs on OpenShift Container Platform (OCP) 4.20, we are consistently observing test failures characterized by `rpc error: code = Unavailable desc = error reading from server: read: connection reset by peer` and `ContainerFailed` errors.

This issue appears to be related to API server instability occurring specifically after the Node HealthCheck (NHC) tests have completed, during the subsequent steps involved in preparing the Machine Health Check (MHC) tests. The error manifests around here in the `./hack/test-e2e.sh` script.

This behavior seems to be a new or more frequent occurrence in OCP 4.18+

*Proposed Solutions / Ideas:*
1. *Add a retry mechanism:* Implement retries in the `./hack/test-e2e.sh` script for the affected steps.
2. *Refactor into code:* Move the problematic test preparation steps into the Go test code itself and leverage Ginkgo/Gomega's `Eventually` matcher for more robust and resilient waiting.

This issue needs to be tracked to ensure the stability of our e2e testing and to investigate potential underlying API server behavior.

links to

medik8s/node-healthcheck-operator#373: Fix flaky e2e

medik8s/node-healthcheck-operator#375: [release-0.9] Backport test fixes

Assignee:: Marc Sluiter

Reporter:: Michael Shitrit

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/07/08 11:49 AM

Updated:: 2025/09/18 7:07 AM

Resolved:: 2025/07/15 9:02 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty