-
Task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
Background
During QE test runs, we observed intermittent failures in tests using `pod.execute` / `kubectl exec` due to transient kubelet connectivity issues:
- Handshake status 500 Internal Server Error
- Error dialing backend: use of closed network connection (kubelet 10250)
These failures are {}not related to product bugs{} but are caused by cluster-side instability.
Objective
Investigate the feasibility of adding a retry/wrapper mechanism in the test framework for such transient failures to improve test reliability.
References
- Example implementation: https://github.com/RedHatQE/openshift-python-wrapper/pull/2581
- Failed test run: https://reportportal-cnv.apps.dno.ocp-hub.prod.psi.redhat.com/ui/#cnv/launches/all/146659/?item0Params=filter.in.status%3DFAILED%26page.page%3D1
Acceptance Criteria
- A design or proposal for handling transient `pod.execute` failures
- Recommendation on whether to implement retries globally or selectively
- is caused by
-
CNV-72482 [POST-UPGRADE][4.20.1][STAGE][TIER-2][test-pytest-cnv-4.20-post-upgrade-marker-storage][cnv-4.20.1]
-
- Closed
-