-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.14.z
-
Critical
-
No
-
False
-
This is a clone of issue OCPBUGS-23300. The following is the description of the original issue:
—
Description of problem:
Actually the issue is same root cause of https://issues.redhat.com/browse/OCPBUGS-9026 but I'd like to open new one since the issue becomes very critical after ROSA using NLB as default since 4.14, HCP(HyperShift) private cluster that without infra nodes is the serious victim because it has worker nodes only and no available workaround for it now. But if we think we could use the old bug to track the issue, then please close this one.
Version-Release number of selected component (if applicable):
4.14.1 HyperShift Private cluster
How reproducible:
100%
Steps to Reproduce:
1. create ROSA HCP(HyperShift) cluster 2. run qe-e2e-test on this cluster, or curl route from one pod inside the cluster 3.
Actual results:
1. co/console status is flapping since route is intermittently accessible $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.1 True False 4h56m Error while reconciling 4.14.1: the cluster operator console is not available 2. check node and router pods running on both worker nodes $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-49-184.ec2.internal Ready worker 5h5m v1.27.6+f67aeb3 ip-10-0-63-210.ec2.internal Ready worker 5h8m v1.27.6+f67aeb3 $ oc -n openshift-ingress get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-86d569bf84-bq66f 1/1 Running 0 5h8m 10.130.0.7 ip-10-0-49-184.ec2.internal <none> <none> router-default-86d569bf84-v54hp 1/1 Running 0 5h8m 10.128.0.9 ip-10-0-63-210.ec2.internal <none> <none> 3. check ingresscontroller LB setting, it uses Internal NLB spec: endpointPublishingStrategy: loadBalancer: dnsManagementPolicy: Managed providerParameters: aws: networkLoadBalancer: {} type: NLB type: AWS scope: Internal type: LoadBalancerService 4. continue to curl the route from a pod inside the cluster $ oc rsh console-operator-86786df488-w6fks Defaulted container "console-operator" out of: console-operator, conversion-webhook-server sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I HTTP/1.1 200 OK sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I Connection timed out
Expected results:
1. co/console should be stable, curl console route should be always OK. 2. qe-e2e-test should not fail
Additional info:
qe-e2e-test on the cluster: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/45369/rehearse-45369-periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-stable-aws-rosa-sts-hypershift-sec-guest-prod-private-link-full-f2/1724307074235502592
- clones
-
OCPBUGS-23300 Internal NLB issue (OCPBUGS-9026) causes random failures on HCP private cluster without infra nodes
- Closed
- is blocked by
-
OCPBUGS-23300 Internal NLB issue (OCPBUGS-9026) causes random failures on HCP private cluster without infra nodes
- Closed
- links to
-
RHBA-2024:0050 OpenShift Container Platform 4.14.z bug fix update