Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1079

Test improvement: pick a catastrophic run and chase down the cause

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None

    Description

      Catastrophic job runs where high numbers of tests fail are common. There are likely many root causes, but let's try to find one. This is a hard task because it's not "this one test failed, figure out why."

      Clusters of failures are more common on certain platforms, it may be fruitful to start with the worst.

      NURP's that average > 5 openshift-tests or openshift-tests-upgrade failures:

                            variants                       |          avg           
      -----------------------------------------------------+------------------------
       {azure,amd64,ovn,upgrade,upgrade-micro,single-node} |   124.5294117647058824
       {azure,amd64,ovn,upgrade,upgrade-minor,single-node} |    92.9090909090909091
       {openstack,amd64,ovn,ha}                            |    49.2105263157894737
       {azure,amd64,sdn,ha,fips}                           |    25.6666666666666667
       {metal-ipi,amd64,ovn,ha}                            |    24.6000000000000000
       {openstack,amd64,ovn,ha,fips}                       |    23.5000000000000000
       {azure,amd64,ovn,ha,hypershift}                     |    22.6666666666666667
       {s390x,sdn,ha}                                      |    22.5454545454545455
       {gcp,amd64,ovn,ha}                                  |    21.5714285714285714
       {ppc64le,sdn,ha}                                    |    17.9545454545454545
       {metal-ipi,amd64,sdn,ha}                            |    17.6000000000000000
       {openstack,amd64,ovn,ha,serial}                     |    15.3333333333333333
       {azure,amd64,ovn,ha}                                |    15.1627906976744186
       {promote}                                           |    15.0000000000000000
       {aws,amd64,ovn,ha}                                  |    14.2558139534883721
       {metal-ipi,amd64,ovn,upgrade,upgrade-minor,ha}      |    13.9375000000000000
       {gcp,amd64,ovn,upgrade,upgrade-minor,ha,realtime}   |    11.2000000000000000
       {azure,amd64,sdn,upgrade,upgrade-minor,ha}          |     9.6842105263157895
       {never-stable}                                      |     9.0740740740740741
       {aws,amd64,ovn,single-node}                         |     8.8666666666666667
       {metal-ipi,amd64,sdn,upgrade,upgrade-micro,ha}      |     7.9090909090909091
       {azure,amd64,sdn,upgrade,upgrade-micro,ha}          |     6.4000000000000000
       {aws,amd64,sdn,ha}                                  |     5.7800000000000000
       {vsphere-ipi,amd64,ovn,ha}                          |     5.6458333333333333
       {openstack,amd64,ovn,upgrade,upgrade-minor,ha}      |     5.6250000000000000
       {metal-ipi,amd64,ovn,upgrade,upgrade-micro,ha}      |     5.5882352941176471
       {aws,amd64,sdn,upgrade,upgrade-micro,ha}            |     5.5789473684210526
      

      Here's a sippy link for 4.12 job runs with > 50 failures: https://sippy.dptools.openshift.org/sippy-ng/jobs/4.12/runs?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522test_failures%2522%252C%2522operatorValue%2522%253A%2522%253E%2522%252C%2522value%2522%253A%252250%2522%257D%252C%257B%2522columnField%2522%253A%2522overall_result%2522%252C%2522operatorValue%2522%253A%2522equals%2522%252C%2522value%2522%253A%2522F%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&sort=desc&sortField=timestamp

      Attachments

        Activity

          People

            stbenjam Stephen Benjamin
            stbenjam Stephen Benjamin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: