Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-73297

[Storage] [T2] [4.20] Investigate adding retry mechanism for transient pod exec failures

XMLWordPrintable

    • Quality / Stability / Reliability
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None

      Background
      During QE test runs, we observed intermittent failures in tests using `pod.execute` / `kubectl exec` due to transient kubelet connectivity issues:

      • Handshake status 500 Internal Server Error
      • Error dialing backend: use of closed network connection (kubelet 10250)

      These failures are {}not related to product bugs{} but are caused by cluster-side instability.

      Objective
      Investigate the feasibility of adding a retry/wrapper mechanism in the test framework for such transient failures to improve test reliability.

      References

      Acceptance Criteria

      • A design or proposal for handling transient `pod.execute` failures
      • Recommendation on whether to implement retries globally or selectively

              rh-ee-ahafe Ahmad Hafi
              rh-ee-ahafe Ahmad Hafi
              Natalie Gavrielov Natalie Gavrielov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: