Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31868

monitor test service-type-load-balancer-availability cleanup failing on http2 client connection lost

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Component Readiness has found a potential regression in [Jira:"Networking / router"] monitor test service-type-load-balancer-availability cleanup.

      Probability of significant regression: 100.00%

      Sample (being evaluated) Release: 4.16
      Start Time: 2024-04-02T00:00:00Z
      End Time: 2024-04-08T23:59:59Z
      Success Rate: 94.67%
      Successes: 213
      Failures: 12
      Flakes: 0

      Base (historical) Release: 4.15
      Start Time: 2024-02-01T00:00:00Z
      End Time: 2024-02-28T23:59:59Z
      Success Rate: 100.00%
      Successes: 751
      Failures: 0
      Flakes: 0

      View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Networking%20%2F%20router&confidence=95&environment=sdn%20upgrade-minor%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&network=sdn&pity=5&platform=azure&platform=azure&sampleEndTime=2024-04-08%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-04-02%2000%3A00%3A00&testId=openshift-tests-upgrade%3A9bc4661b05ba13ed49d4c91f63899776&testName=%5BJira%3A%22Networking%20%2F%20router%22%5D%20monitor%20test%20service-type-load-balancer-availability%20cleanup&upgrade=upgrade-minor&upgrade=upgrade-minor&variant=standard&variant=standard

      The failure message that we're after here is

      {  failed during cleanup
      Get "https://api.ci-op-tgk1b3if-9d969.ci2.azure.devcluster.openshift.com:6443/api/v1/namespaces/e2e-service-lb-test-xqptd": http2: client connection lost}
      

      Looking at the sample runs, the failure is in monitortest e2e xml junit files, and it appears this one always happens after upgrade, but before conformance. Unfortunately that means we may not have reliable intervals during the time this occurs. It also means there's no excuse for a connection lost to the apiserver.

      Example: this junit xml from this job run

       

      The problem actually dates back to March 3, see attachment for the full list of job runs affected. Almost entirely Azure, entirely 4.16 (never happened prior as far as we can see back).

      It occurs in a poll loop checking if a namespace exists after being deleted. Failure rate seems to be around 5% of the time on this specific job.

              jluhrsen Jamo Luhrsen
              rhn-engineering-dgoodwin Devan Goodwin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: