-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.13, 4.12, 4.11, 4.14, 4.15, 4.16
-
Low
-
No
-
1
-
Sprint 254
-
1
-
Rejected
-
False
-
-
N/A
-
Release Note Not Required
-
In Progress
This is a clone of issue OCPBUGS-30091. The following is the description of the original issue:
—
Description of problem
CI is flaky because the TestHostNetworkPort test fails:
=== NAME TestAll/serial/TestHostNetworkPortBinding operator_test.go:1034: Expected conditions: map[Admitted:True Available:True DNSManaged:False DeploymentReplicasAllAvailable:True LoadBalancerManaged:False] Current conditions: map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:False DeploymentReplicasMinAvailable:True DeploymentRollingOut:True EvaluationConditionsDetected:False LoadBalancerManaged:False LoadBalancerProgressing:False Progressing:True Upgradeable:True] operator_test.go:1034: Ingress Controller openshift-ingress-operator/samehost status: { "availableReplicas": 0, "selector": "ingresscontroller.operator.openshift.io/deployment-ingresscontroller=samehost", "domain": "samehost.ci-op-xlwngvym-43abb.origin-ci-int-aws.dev.rhcloud.com", "endpointPublishingStrategy": { "type": "HostNetwork", "hostNetwork": { "protocol": "TCP", "httpPort": 9080, "httpsPort": 9443, "statsPort": 9936 } }, "conditions": [ { "type": "Admitted", "status": "True", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "Valid" }, { "type": "DeploymentAvailable", "status": "True", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "DeploymentAvailable", "message": "The deployment has Available status condition set to True" }, { "type": "DeploymentReplicasMinAvailable", "status": "True", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "DeploymentMinimumReplicasMet", "message": "Minimum replicas requirement is met" }, { "type": "DeploymentReplicasAllAvailable", "status": "False", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "DeploymentReplicasNotAvailable", "message": "0/1 of replicas are available" }, { "type": "DeploymentRollingOut", "status": "True", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "DeploymentRollingOut", "message": "Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n" }, { "type": "LoadBalancerManaged", "status": "False", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "EndpointPublishingStrategyExcludesManagedLoadBalancer", "message": "The configured endpoint publishing strategy does not include a managed load balancer" }, { "type": "LoadBalancerProgressing", "status": "False", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "LoadBalancerNotProgressing", "message": "LoadBalancer is not progressing" }, { "type": "DNSManaged", "status": "False", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "UnsupportedEndpointPublishingStrategy", "message": "The endpoint publishing strategy doesn't support DNS management." }, { "type": "Available", "status": "True", "lastTransitionTime": "2024-02-26T17:25:39Z" }, { "type": "Progressing", "status": "True", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "IngressControllerProgressing", "message": "One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n)" }, { "type": "Degraded", "status": "False", "lastTransitionTime": "2024-02-26T17:25:39Z" }, { "type": "Upgradeable", "status": "True", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "Upgradeable", "message": "IngressController is upgradeable." }, { "type": "EvaluationConditionsDetected", "status": "False", "lastTransitionTime": "2024-02-26T17:25:39Z", "reason": "NoEvaluationCondition", "message": "No evaluation condition is detected." } ], "tlsProfile": { "ciphers": [ "ECDHE-ECDSA-AES128-GCM-SHA256", "ECDHE-RSA-AES128-GCM-SHA256", "ECDHE-ECDSA-AES256-GCM-SHA384", "ECDHE-RSA-AES256-GCM-SHA384", "ECDHE-ECDSA-CHACHA20-POLY1305", "ECDHE-RSA-CHACHA20-POLY1305", "DHE-RSA-AES128-GCM-SHA256", "DHE-RSA-AES256-GCM-SHA384", "TLS_AES_128_GCM_SHA256", "TLS_AES_256_GCM_SHA384", "TLS_CHACHA20_POLY1305_SHA256" ], "minTLSVersion": "VersionTLS12" }, "observedGeneration": 1 } operator_test.go:1036: failed to observe expected conditions for the second ingresscontroller: timed out waiting for the condition operator_test.go:1059: deleted ingresscontroller samehost operator_test.go:1059: deleted ingresscontroller hostnetworkportbinding
This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1017/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1762147882179235840. Search.ci shows another failure: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/48873/rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi/1762576595890999296. The test has failed sporadically in the past, beyond what search.ci is able to search.
TestHostNetworkPort is marked as a serial test in TestAll and marked with t.Parallel() in the test itself. Not sure if this is what is causing a new failure seen in this test, but something is incorrect.
Version-Release number of selected component (if applicable)
The test failures have been observed recently on 4.16 as well as on 4.12 (https://github.com/openshift/cluster-ingress-operator/pull/828#issuecomment-1292888086) and 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/914#issuecomment-1526808286). The logic error was introduced in 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc).
How reproducible
The logic error is self-evident. The test failure is very rare. The failure has been observed sporadically over the past couple years. Presently, search.ci shows two failures, with the following impact, for the past 14 days:
rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi (all) - 3 runs, 33% failed, 100% of failures match = 33% impact
pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 16 runs, 25% failed, 25% of failures match = 6% impact
Steps to Reproduce
N/A.
Actual results
The TestHostNetworkPort test fails. The test is marked as both serial and parallel.
Expected results
Test should be marked as either serial or parallel, and it should pass consistently.
Additional info
When TestAll was introduced, TestHostNetworkPortBinding was initially marked parallel in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc. After some discussion, it was moved to the serial list in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a449e497e35fafeecbee9ea656e0631393182f70, but the commit to remove t.Parallel() evidently got inadvertently dropped.
- blocks
-
OCPBUGS-34973 [Backport 4.14] TestHostNetworkPort is half serial and half parallel
- Closed
- clones
-
OCPBUGS-30091 TestHostNetworkPort is half serial and half parallel
- Closed
- is blocked by
-
OCPBUGS-30091 TestHostNetworkPort is half serial and half parallel
- Closed
- is cloned by
-
OCPBUGS-34973 [Backport 4.14] TestHostNetworkPort is half serial and half parallel
- Closed
- links to
-
RHBA-2024:3889 OpenShift Container Platform 4.15.z bug fix update