Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.15.0
Affects Version/s: 4.14.z
Component/s: HyperShift
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
No

Target Backport Versions:
None
Target Version:

4.15.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, if the console Operator and Ingress pods were located on the same node, the console Operator would fail and mark the console cluster Operator as unavailable. With this release, if the console Operator and Ingress pods are located on the same node, the console operator no longer fails. (link:https://issues.redhat.com/browse/OCPBUGS-23300[*~~OCPBUGS-23300~~*])

Show
* Previously, if the console Operator and Ingress pods were located on the same node, the console Operator would fail and mark the console cluster Operator as unavailable. With this release, if the console Operator and Ingress pods are located on the same node, the console operator no longer fails. (link: https://issues.redhat.com/browse/OCPBUGS-23300 [* OCPBUGS-23300 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Actually the issue is same root cause of https://issues.redhat.com/browse/OCPBUGS-9026 but I'd like to open new one since the issue becomes very critical after ROSA using NLB as default since 4.14, HCP(HyperShift) private cluster that without infra nodes is the serious victim because it has worker nodes only and no available workaround for it now.

But if we think we could use the old bug to track the issue, then please close this one.

Version-Release number of selected component (if applicable):

4.14.1
HyperShift Private cluster

How reproducible:

100%

Steps to Reproduce:

1. create ROSA HCP(HyperShift) cluster
2. run qe-e2e-test on this cluster, or curl route from one pod inside the cluster
3.

Actual results:

1. co/console status is flapping since route is intermittently accessible 
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.1    True        False         4h56m   Error while reconciling 4.14.1: the cluster operator console is not available


2. check node and router pods running on both worker nodes
$ oc get node
NAME                          STATUS   ROLES    AGE    VERSION
ip-10-0-49-184.ec2.internal   Ready    worker   5h5m   v1.27.6+f67aeb3
ip-10-0-63-210.ec2.internal   Ready    worker   5h8m   v1.27.6+f67aeb3

$ oc -n openshift-ingress get pod -owide
NAME                              READY   STATUS    RESTARTS   AGE    IP           NODE                          NOMINATED NODE   READINESS GATES
router-default-86d569bf84-bq66f   1/1     Running   0          5h8m   10.130.0.7   ip-10-0-49-184.ec2.internal   <none>           <none>
router-default-86d569bf84-v54hp   1/1     Running   0          5h8m   10.128.0.9   ip-10-0-63-210.ec2.internal   <none>           <none>

3. check ingresscontroller LB setting, it uses Internal NLB

spec:
  endpointPublishingStrategy:
    loadBalancer:
      dnsManagementPolicy: Managed
      providerParameters:
        aws:
          networkLoadBalancer: {}
          type: NLB
        type: AWS
      scope: Internal
    type: LoadBalancerService

4. continue to curl the route from a pod inside the cluster
$ oc rsh console-operator-86786df488-w6fks
Defaulted container "console-operator" out of: console-operator, conversion-webhook-server

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
HTTP/1.1 200 OK

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
Connection timed out

Expected results:

1. co/console should be stable, curl console route should be always OK.
2. qe-e2e-test should not fail

Additional info:

qe-e2e-test on the cluster:

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/45369/rehearse-45369-periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-stable-aws-rosa-sts-hypershift-sec-guest-prod-private-link-full-f2/1724307074235502592

blocks

OCPBUGS-23968 Internal NLB issue (OCPBUGS-9026) causes random failures on HCP private cluster without infra nodes

Closed

is cloned by

OCPBUGS-23968 Internal NLB issue (OCPBUGS-9026) causes random failures on HCP private cluster without infra nodes

Closed

OCPBUGS-23752 Internal NLB issue (OCPBUGS-9026) causes random failures on HCP private cluster without infra nodes

Closed

relates to

RFE-2106 Disable Client IP preservation in AWS NLB for sharded ingress service

Approved

links to

openshift/console-operator#815: OCPBUGS-23300: Disable route controller health check for NLB setup

RHEA-2023:7198 rpm

mentioned on

Merge request - Bump HO to b06d0bd v0.1.16 in stage

(1 links to, 1 mentioned on)

Assignee:: Jakub Hadvig

Reporter:: Hongan Li

QA Contact:: He Liu

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Created:: 2023/11/15 8:34 AM

Updated:: 2025/07/24 11:40 PM

Resolved:: 2024/02/27 9:02 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates