Loading...

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.15.z
Affects Version/s: 4.13, 4.12, 4.11, 4.14, 4.15, 4.16
Component/s: Networking / router
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Severity:
Low
Regression:
No

Target Backport Versions:
None
Target Version:

4.15.z
Release Blocker:
Rejected
Sprint:
Sprint 254
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
In Progress
Release Note Type:
Release Note Not Required
Release Note Text:
N/A

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-30091~~. The following is the description of the original issue:
—

Description of problem

CI is flaky because the TestHostNetworkPort test fails:

=== NAME  TestAll/serial/TestHostNetworkPortBinding
    operator_test.go:1034: Expected conditions: map[Admitted:True Available:True DNSManaged:False DeploymentReplicasAllAvailable:True LoadBalancerManaged:False]
         Current conditions: map[Admitted:True Available:True DNSManaged:False Degraded:False DeploymentAvailable:True DeploymentReplicasAllAvailable:False DeploymentReplicasMinAvailable:True DeploymentRollingOut:True EvaluationConditionsDetected:False LoadBalancerManaged:False LoadBalancerProgressing:False Progressing:True Upgradeable:True]
    operator_test.go:1034: Ingress Controller openshift-ingress-operator/samehost status: {
          "availableReplicas": 0,
          "selector": "ingresscontroller.operator.openshift.io/deployment-ingresscontroller=samehost",
          "domain": "samehost.ci-op-xlwngvym-43abb.origin-ci-int-aws.dev.rhcloud.com",
          "endpointPublishingStrategy": {
            "type": "HostNetwork",
            "hostNetwork": {
              "protocol": "TCP",
              "httpPort": 9080,
              "httpsPort": 9443,
              "statsPort": 9936
            }
          },
          "conditions": [
            {
              "type": "Admitted",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "Valid"
            },
            {
              "type": "DeploymentAvailable",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentAvailable",
              "message": "The deployment has Available status condition set to True"
            },
            {
              "type": "DeploymentReplicasMinAvailable",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentMinimumReplicasMet",
              "message": "Minimum replicas requirement is met"
            },
            {
              "type": "DeploymentReplicasAllAvailable",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentReplicasNotAvailable",
              "message": "0/1 of replicas are available"
            },
            {
              "type": "DeploymentRollingOut",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "DeploymentRollingOut",
              "message": "Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n"
            },
            {
              "type": "LoadBalancerManaged",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "EndpointPublishingStrategyExcludesManagedLoadBalancer",
              "message": "The configured endpoint publishing strategy does not include a managed load balancer"
            },
            {
              "type": "LoadBalancerProgressing",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "LoadBalancerNotProgressing",
              "message": "LoadBalancer is not progressing"
            },
            {
              "type": "DNSManaged",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "UnsupportedEndpointPublishingStrategy",
              "message": "The endpoint publishing strategy doesn't support DNS management."
            },
            {
              "type": "Available",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z"
            },
            {
              "type": "Progressing",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "IngressControllerProgressing",
              "message": "One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 0 of 1 updated replica(s) are available...\n)"
            },
            {
              "type": "Degraded",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z"
            },
            {
              "type": "Upgradeable",
              "status": "True",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "Upgradeable",
              "message": "IngressController is upgradeable."
            },
            {
              "type": "EvaluationConditionsDetected",
              "status": "False",
              "lastTransitionTime": "2024-02-26T17:25:39Z",
              "reason": "NoEvaluationCondition",
              "message": "No evaluation condition is detected."
            }
          ],
          "tlsProfile": {
            "ciphers": [
              "ECDHE-ECDSA-AES128-GCM-SHA256",
              "ECDHE-RSA-AES128-GCM-SHA256",
              "ECDHE-ECDSA-AES256-GCM-SHA384",
              "ECDHE-RSA-AES256-GCM-SHA384",
              "ECDHE-ECDSA-CHACHA20-POLY1305",
              "ECDHE-RSA-CHACHA20-POLY1305",
              "DHE-RSA-AES128-GCM-SHA256",
              "DHE-RSA-AES256-GCM-SHA384",
              "TLS_AES_128_GCM_SHA256",
              "TLS_AES_256_GCM_SHA384",
              "TLS_CHACHA20_POLY1305_SHA256"
            ],
            "minTLSVersion": "VersionTLS12"
          },
          "observedGeneration": 1
        }
    operator_test.go:1036: failed to observe expected conditions for the second ingresscontroller: timed out waiting for the condition
    operator_test.go:1059: deleted ingresscontroller samehost
    operator_test.go:1059: deleted ingresscontroller hostnetworkportbinding

This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1017/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1762147882179235840. Search.ci shows another failure: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/48873/rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi/1762576595890999296. The test has failed sporadically in the past, beyond what search.ci is able to search.

TestHostNetworkPort is marked as a serial test in TestAll and marked with t.Parallel() in the test itself. Not sure if this is what is causing a new failure seen in this test, but something is incorrect.

Version-Release number of selected component (if applicable)

The test failures have been observed recently on 4.16 as well as on 4.12 (https://github.com/openshift/cluster-ingress-operator/pull/828#issuecomment-1292888086) and 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/914#issuecomment-1526808286). The logic error was introduced in 4.11 (https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc).

How reproducible

The logic error is self-evident. The test failure is very rare. The failure has been observed sporadically over the past couple years. Presently, search.ci shows two failures, with the following impact, for the past 14 days:

rehearse-48873-pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-gatewayapi (all) - 3 runs, 33% failed, 100% of failures match = 33% impact

pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 16 runs, 25% failed, 25% of failures match = 6% impact

Steps to Reproduce

N/A.

Actual results

The TestHostNetworkPort test fails. The test is marked as both serial and parallel.

Expected results

Test should be marked as either serial or parallel, and it should pass consistently.

Additional info

When TestAll was introduced, TestHostNetworkPortBinding was initially marked parallel in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a22322b25569059c61e1973f37f0a4b49e9407bc. After some discussion, it was moved to the serial list in https://github.com/openshift/cluster-ingress-operator/pull/756/commits/a449e497e35fafeecbee9ea656e0631393182f70, but the commit to remove t.Parallel() evidently got inadvertently dropped.

blocks

OCPBUGS-34973 [Backport 4.14] TestHostNetworkPort is half serial and half parallel

Closed

clones

OCPBUGS-30091 TestHostNetworkPort is half serial and half parallel

Closed

is blocked by

OCPBUGS-30091 TestHostNetworkPort is half serial and half parallel

Closed

is cloned by

OCPBUGS-34973 [Backport 4.14] TestHostNetworkPort is half serial and half parallel

Closed

links to

openshift/cluster-ingress-operator#1075: [release-4.15] OCPBUGS-34887: TestHostNetworkPortBinding: Delete t.Parallel()

RHBA-2024:3889 OpenShift Container Platform 4.15.z bug fix update

(1 links to)

Details

Description

Description of problem

Version-Release number of selected component (if applicable)

How reproducible

Steps to Reproduce

Actual results

Expected results

Additional info

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates