Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42378

CSI jobs reporting significant in-cluster disruption in 4.18

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.18.0
    • Storage
    • None
    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      TRT disruption monitoring picked up a severe change in the disruption P95 on Azure, which turned out to all be originating from one job: periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-csi

      The graph indicates the problem started on Aug 24th.

      The disruption appears linked to a very long running test:

      External Storage [Driver: disk.csi.azure.com] [Testpattern: Dynamic PV (filesystem volmode)] OpenShift CSI extended - SCSI LUN Overflow should use many PVs on a single node [Serial][Timeout:60m]

      This test can run for up to 45 minutes in some cases and sometimes causes loss of internal networking to one host.

      Sample job runs, which can be found by going to the dashboard link in the start of the description and scrolling down to job runs, looking for those with high numbers.

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-csi/1838361911465349120

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-csi/1832547132029014016

      Expand the intervals chart to see the disruption on any run.

      Outages range from 100-400 seconds, which is really quite severe. A node is going not-ready and this appears to be the one all the disruption backends that fail are hitting.

      Is this expected for this test? It seems like it might be indicating a real problem.

              rhn-engineering-jsafrane Jan Safranek
              rhn-engineering-dgoodwin Devan Goodwin
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: