Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30091

TestHostNetworkPort is half serial and half parallel

    XMLWordPrintable

Details

    • Low
    • No
    • 1
    • Sprint 250
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required
    • In Progress

    Description

      Description of problem

      CI is flaky because the TestHostNetworkPort test fails:

      === NAME  TestAll/serial/TestHostNetworkPortBinding
          operator_test.go:1034: Expected conditions: map[Admitted:True Available:True DNSManaged:False DeploymentReplicasAllAvailable:True LoadBalancerManaged:False]
               Current conditions: map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:False DeploymentReplicasMinAvailable:True DeploymentRollingOut:True EvaluationConditionsDetected:False LoadBalancerManaged:False LoadBalancerProgressing:False Progressing:True Upgradeable:True]
          operator_test.go:1034: Ingress Controller openshift-ingress-operator/samehost status: {
                "availableReplicas": 0,
                "selector": "ingresscontroller.operator.openshift.io/deployment-ingresscontroller=samehost",
                "domain": "samehost.ci-op-xlwngvym-43abb.origin-ci-int-aws.dev.rhcloud.com",
                "endpointPublishingStrategy": {
                  "type": "HostNetwork",
                  "hostNetwork": {
                    "protocol": "TCP",
                    "httpPort": 9080,
                    "httpsPort": 9443,
                    "statsPort": 9936
                  }
                },
                "conditions": [
                  {
                    "type": "Admitted",
                    "status": "True",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "Valid"
                  },
                  {
                    "type": "DeploymentAvailable",
                    "status": "True",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "DeploymentAvailable",
                    "message": "The deployment has Available status condition set to True"
                  },
                  {
                    "type": "DeploymentReplicasMinAvailable",
                    "status": "True",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "DeploymentMinimumReplicasMet",
                    "message": "Minimum replicas requirement is met"
                  },
                  {
                    "type": "DeploymentReplicasAllAvailable",
                    "status": "False",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "DeploymentReplicasNotAvailable",
                    "message": "0/1 of replicas are available"
                  },
                  {
                    "type": "DeploymentRollingOut",
                    "status": "True",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "DeploymentRollingOut",
                    "message": "Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n"
                  },
                  {
                    "type": "LoadBalancerManaged",
                    "status": "False",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "EndpointPublishingStrategyExcludesManagedLoadBalancer",
                    "message": "The configured endpoint publishing strategy does not include a managed load balancer"
                  },
                  {
                    "type": "LoadBalancerProgressing",
                    "status": "False",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "LoadBalancerNotProgressing",
                    "message": "LoadBalancer is not progressing"
                  },
                  {
                    "type": "DNSManaged",
                    "status": "False",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "UnsupportedEndpointPublishingStrategy",
                    "message": "The endpoint publishing strategy doesn't support DNS management."
                  },
                  {
                    "type": "Available",
                    "status": "True",
                    "lastTransitionTime": "2024-02-26T17:25:39Z"
                  },
                  {
                    "type": "Progressing",
                    "status": "True",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "IngressControllerProgressing",
                    "message": "One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n)"
                  },
                  {
                    "type": "Degraded",
                    "status": "False",
                    "lastTransitionTime": "2024-02-26T17:25:39Z"
                  },
                  {
                    "type": "Upgradeable",
                    "status": "True",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "Upgradeable",
                    "message": "IngressController is upgradeable."
                  },
                  {
                    "type": "EvaluationConditionsDetected",
                    "status": "False",
                    "lastTransitionTime": "2024-02-26T17:25:39Z",
                    "reason": "NoEvaluationCondition",
                    "message": "No evaluation condition is detected."
                  }
                ],
                "tlsProfile": {
                  "ciphers": [
                    "ECDHE-ECDSA-AES128-GCM-SHA256",
                    "ECDHE-RSA-AES128-GCM-SHA256",
                    "ECDHE-ECDSA-AES256-GCM-SHA384",
                    "ECDHE-RSA-AES256-GCM-SHA384",
                    "ECDHE-ECDSA-CHACHA20-POLY1305",
                    "ECDHE-RSA-CHACHA20-POLY1305",
                    "DHE-RSA-AES128-GCM-SHA256",
                    "DHE-RSA-AES256-GCM-SHA384",
                    "TLS_AES_128_GCM_SHA256",
                    "TLS_AES_256_GCM_SHA384",
                    "TLS_CHACHA20_POLY1305_SHA256"
                  ],
                  "minTLSVersion": "VersionTLS12"
                },
                "observedGeneration": 1
              }
          operator_test.go:1036: failed to observe expected conditions for the second ingresscontroller: timed out waiting for the condition
          operator_test.go:1059: deleted ingresscontroller samehost
          operator_test.go:1059: deleted ingresscontroller hostnetworkportbinding
      

      This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1017/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1762147882179235840. Search.ci shows another failure: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/48873/rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi/1762576595890999296. The test has failed sporadically in the past, beyond what search.ci is able to search.

      TestHostNetworkPort is marked as a serial test in TestAll and marked with t.Parallel() in the test itself. Not sure if this is what is causing a new failure seen in this test, but something is incorrect.

      Version-Release number of selected component (if applicable)

      The test failures have been observed recently on 4.16 as well as on 4.12 (https://github.com/openshift/cluster-ingress-operator/pull/828#issuecomment-1292888086) and 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/914#issuecomment-1526808286). The logic error was introduced in 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc).

      How reproducible

      The logic error is self-evident. The test failure is very rare. The failure has been observed sporadically over the past couple years. Presently, search.ci shows two failures, with the following impact, for the past 14 days:

      rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi (all) - 3 runs, 33% failed, 100% of failures match = 33% impact
      
      pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 16 runs, 25% failed, 25% of failures match = 6% impact
      

      Steps to Reproduce

      N/A.

      Actual results

      The TestHostNetworkPort test fails. The test is marked as both serial and parallel.

      Expected results

      Test should be marked as either serial or parallel, and it should pass consistently.

      Additional info

      When TestAll was introduced, TestHostNetworkPortBinding was initially marked parallel in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc. After some discussion, it was moved to the serial list in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a449e497e35fafeecbee9ea656e0631393182f70, but the commit to remove t.Parallel() evidently got inadvertently dropped.

      Attachments

        Activity

          People

            mmasters1@redhat.com Miciah Masters
            cholman@redhat.com Candace Holman
            Hongan Li Hongan Li
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: