Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-41643

In cluster disruption regression detected with azure CSI test job

XMLWordPrintable

    • Yes
    • Proposed
    • False
    • Hide

      None

      Show
      None

      TRT's disruption dashboard reveals an incluster backend disruption regression specifically during periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-csi

      This dashboard gives you an overview of the disruption:

      https://grafana-loki.ci.openshift.org/d/ISnBj4LVk/disruption?var-platform=azure&var-percentile=P95&var-backend=pod-to-host-reused-connections&var-backend=pod-to-host-new-connections&var-backend=host-to-host-new-connections&var-backend=host-to-host-reused-connections&var-releases=4.18&var-upgrade_type=none&var-networks=ovn&var-topologies=ha&var-architectures=amd64&var-lookback=7&var-master_nodes_updated=N&var-min_disruption_regression=-10&var-min_disruption_job_list=100&var-min_relevance=0&orgId=1

      Here is an example the clearly shows a column of red bars in the spyglass chart:

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-csi/1833160946827137024

      We initially suspected that this has something to do with the test:

      External Storage [Driver: csi.vsphere.vmware.com] [Testpattern: Dynamic PV (filesystem volmode)] OpenShift CSI extended - SCSI LUN Overflow should use many PVs on a single node [Serial][Timeout:30m]

      This test is recently added. But looking at the job run history, I did see one failure that happened right before that test was include:

      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-azure-csi/1827223741633925120

      But the timing seems to be very close to when that test started running. Maybe it is related to some changes done in those couple of days.

      The disruption delta is over 40s per endpoint. We need to investigate to see if this does reveal a product issue.

       

      Version-Release number of selected component (if applicable):

              rhn-engineering-jsafrane Jan Safranek
              kenzhang@redhat.com Ken Zhang
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: