Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31868

monitor test service-type-load-balancer-availability cleanup failing on http2 client connection lost

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

    Description

      Component Readiness has found a potential regression in [Jira:"Networking / router"] monitor test service-type-load-balancer-availability cleanup.

      Probability of significant regression: 100.00%

      Sample (being evaluated) Release: 4.16
      Start Time: 2024-04-02T00:00:00Z
      End Time: 2024-04-08T23:59:59Z
      Success Rate: 94.67%
      Successes: 213
      Failures: 12
      Flakes: 0

      Base (historical) Release: 4.15
      Start Time: 2024-02-01T00:00:00Z
      End Time: 2024-02-28T23:59:59Z
      Success Rate: 100.00%
      Successes: 751
      Failures: 0
      Flakes: 0

      View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Networking%20%2F%20router&confidence=95&environment=sdn%20upgrade-minor%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&network=sdn&pity=5&platform=azure&platform=azure&sampleEndTime=2024-04-08%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-04-02%2000%3A00%3A00&testId=openshift-tests-upgrade%3A9bc4661b05ba13ed49d4c91f63899776&testName=%5BJira%3A%22Networking%20%2F%20router%22%5D%20monitor%20test%20service-type-load-balancer-availability%20cleanup&upgrade=upgrade-minor&upgrade=upgrade-minor&variant=standard&variant=standard

      The failure message that we're after here is

      {  failed during cleanup
      Get "https://api.ci-op-tgk1b3if-9d969.ci2.azure.devcluster.openshift.com:6443/api/v1/namespaces/e2e-service-lb-test-xqptd": http2: client connection lost}
      

      Looking at the sample runs, the failure is in monitortest e2e xml junit files, and it appears this one always happens after upgrade, but before conformance. Unfortunately that means we may not have reliable intervals during the time this occurs. It also means there's no excuse for a connection lost to the apiserver.

      Example: this junit xml from this job run

       

      The problem actually dates back to March 3, see attachment for the full list of job runs affected. Almost entirely Azure, entirely 4.16 (never happened prior as far as we can see back).

      It occurs in a poll loop checking if a namespace exists after being deleted. Failure rate seems to be around 5% of the time on this specific job.

      Attachments

        Activity

          People

            jluhrsen Jamo Luhrsen
            rhn-engineering-dgoodwin Devan Goodwin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: