Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1079

Test improvement: pick a catastrophic run and chase down the cause

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None

      Catastrophic job runs where high numbers of tests fail are common. There are likely many root causes, but let's try to find one. This is a hard task because it's not "this one test failed, figure out why."

      Clusters of failures are more common on certain platforms, it may be fruitful to start with the worst.

      NURP's that average > 5 openshift-tests or openshift-tests-upgrade failures:

                            variants                       |          avg           
      -----------------------------------------------------+------------------------
       {azure,amd64,ovn,upgrade,upgrade-micro,single-node} |   124.5294117647058824
       {azure,amd64,ovn,upgrade,upgrade-minor,single-node} |    92.9090909090909091
       {openstack,amd64,ovn,ha}                            |    49.2105263157894737
       {azure,amd64,sdn,ha,fips}                           |    25.6666666666666667
       {metal-ipi,amd64,ovn,ha}                            |    24.6000000000000000
       {openstack,amd64,ovn,ha,fips}                       |    23.5000000000000000
       {azure,amd64,ovn,ha,hypershift}                     |    22.6666666666666667
       {s390x,sdn,ha}                                      |    22.5454545454545455
       {gcp,amd64,ovn,ha}                                  |    21.5714285714285714
       {ppc64le,sdn,ha}                                    |    17.9545454545454545
       {metal-ipi,amd64,sdn,ha}                            |    17.6000000000000000
       {openstack,amd64,ovn,ha,serial}                     |    15.3333333333333333
       {azure,amd64,ovn,ha}                                |    15.1627906976744186
       {promote}                                           |    15.0000000000000000
       {aws,amd64,ovn,ha}                                  |    14.2558139534883721
       {metal-ipi,amd64,ovn,upgrade,upgrade-minor,ha}      |    13.9375000000000000
       {gcp,amd64,ovn,upgrade,upgrade-minor,ha,realtime}   |    11.2000000000000000
       {azure,amd64,sdn,upgrade,upgrade-minor,ha}          |     9.6842105263157895
       {never-stable}                                      |     9.0740740740740741
       {aws,amd64,ovn,single-node}                         |     8.8666666666666667
       {metal-ipi,amd64,sdn,upgrade,upgrade-micro,ha}      |     7.9090909090909091
       {azure,amd64,sdn,upgrade,upgrade-micro,ha}          |     6.4000000000000000
       {aws,amd64,sdn,ha}                                  |     5.7800000000000000
       {vsphere-ipi,amd64,ovn,ha}                          |     5.6458333333333333
       {openstack,amd64,ovn,upgrade,upgrade-minor,ha}      |     5.6250000000000000
       {metal-ipi,amd64,ovn,upgrade,upgrade-micro,ha}      |     5.5882352941176471
       {aws,amd64,sdn,upgrade,upgrade-micro,ha}            |     5.5789473684210526
      

      Here's a sippy link for 4.12 job runs with > 50 failures: https://sippy.dptools.openshift.org/sippy-ng/jobs/4.12/runs?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522test_failures%2522%252C%2522operatorValue%2522%253A%2522%253E%2522%252C%2522value%2522%253A%252250%2522%257D%252C%257B%2522columnField%2522%253A%2522overall_result%2522%252C%2522operatorValue%2522%253A%2522equals%2522%252C%2522value%2522%253A%2522F%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&sort=desc&sortField=timestamp

        1. screenshot-1.png
          47 kB
          Stephen Benjamin
        2. screenshot-2.png
          587 kB
          Stephen Benjamin

            stbenjam Stephen Benjamin
            stbenjam Stephen Benjamin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: