Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: MCE 2.3.1
Affects Version/s: ACM 2.7.0
Component/s: HyperShift
Labels:

Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
MGMT-14622
Feature Link:
OCPSTRAT-618 - [GA] Self-managed Hosted Control Planes support for BM using the Agent Provider
Intelligence Requested:
Market:

Sprint:
AI-30

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of the problem:

As part of the ecosystem QE testing on hypershift clusters on real baremetals, we tried to deploy several times with several environments an hypershift cluster with 1 or 2 workers.

while the cluster is created successfully, we never succeeded to arrive to a place where the ingress operator is functioning well, although we set up all the necessary dns entries (A records) - the route to the dns is never found, even if we can ping to it:

$ oc get co
NAME                    VERSION  AVAILABLE  PROGRESSING  DEGRADED  SINCE  MESSAGE
console                  4.12.9  False    False     False   119m  RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.registry.ocp-edge4.lab.eng.tlv2.redhat.com): Get "https://console-openshift-console.apps.registry.ocp-edge4.lab.eng.tlv2.redhat.com": dial tcp 10.46.29.134:443: connect: no route to host
csi-snapshot-controller          4.12.9  True    False     False   3h16m  
dns                    4.12.9  True    False     False   118m   
image-registry               4.12.9  True    False     False   119m   
ingress                  4.12.9  True    False     True    9m59s  The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)

Same error is shown when we are trying to deploy a cluster with metalLB and use one of the loadbalancer ips as the api point.

I'm opening this incident report to track after this effort and see if there might be something we are missing here.

I'm attaching a must-gather logs to see if something in the way ingress work on BM is different from kubevirt for example - when this is working as expected.

also, on libvirt it is working as well, so our thoughts is that the reason for the issue is that the workers and the hub are not in the exact same network - and if that is by design, or expected, will appreciate any comments. tnx.

Version-Release number of selected component:

advanced-cluster-management.v2.7.3 - 2.7.3-DOWNSTREAM-2023-03-22-19-15-09

4.12.0-0.nightly-2023-03-21-173554{}

How reproducible:

100%

Steps to reproduce:

1. Deploy hub with 3 masters on real BM

2. Install ACM and hypershift operator

3. Deploy hypershift cluster with 1 or 2 real bm workers

must-gather: https://drive.google.com/drive/folders/1NH0Q5nmARQ5c9P-qTKd211p-38OiNsNY?usp=sharing

links to

openshift/hypershift#2676: ACM-4644: Canary route checks for the default ingress controller are failing

mentioned in: Page Loading...

Assignee:: Eran Cohen

Reporter:: Shelly Miron

QA Contact:: David Huynh

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/03/29 2:58 PM

Updated:: 2024/08/21 2:24 PM

Resolved:: 2024/08/21 2:24 PM

Target start:: 2023/04/19

Target end:: 2023/07/15

Details

Description

Version-Release number of selected component:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates