Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43565

etcd platform pod exist test failing on etcd-scaling jobs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.18.0
    • Etcd
    • None
    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      [sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
      

      This new test appears to be a problem on etcd-scaling jobs where the exits are presently expected.

      Example failure: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-ovn-etcd-scaling/1846579352007872512

      An exception needs to be added, however we do not have a mechanism to add an exception for within a specific job right now, all we have to go on here is job name which is an imperfect way to disable these tests.

      We don't want to disable the whole monitortest as that would shut down checks for pod exits on all the other control plane pods in the etcd-scaling job.

      https://redhat-internal.slack.com/archives/C027U68LP/p1729174182218029 has details on why the test is presently expected to fail and some thoughts around how this could be solved:

      via deads:
      1. finding a way to avoid restarting containers is best
      2. finding a way to make the exit more graceful is next best
      3. skipping only the exact pod pattern on the exact test is next best (not on all jobs, only the scaling ones)

      skipping the monitor test is not viable.

      An option for 3 could be an env var that is applied only in the etcd-scaling job configuration, and the test could look for. Namespaces to skip the check on, comma separated or similar.

              dwest@redhat.com Dean West
              rhn-engineering-dgoodwin Devan Goodwin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: