-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.18.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Unable to access web console for hcp cluster which is used as ACM hub. Below mentioned error messages are displayed.
2025-01-22T11:07:49.870Z ERROR operator.ingress_controller controller/controller.go:116 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing. Last 1 error messages:\nerror sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hcp418-rack03-hub.apps.rackm03.mydomain.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers) (x83 over 1h22m0s))"} 2025-01-22T11:07:49.870Z INFO operator.ingress_controller controller/controller.go:116 reconciling {"request": {"name":"default","namespace":"openshift-ingress-operator"}} 2025-01-22T11:07:49.889Z INFO operator.status_controller controller/controller.go:116 Reconciling {"request": {"name":"default","namespace":"openshift-ingress-operator"}} 2025-01-22T11:07:49.942Z ERROR operator.ingress_controller controller/controller.go:116 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing. Last 1 error messages:\nerror sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hcp418-rack03-hub.apps.rackm03.mydomain.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers) (x83 over 1h22m0s))"} 2025-01-22T11:08:45.290Z INFO operator.ingress_controller controller/controller.go:116 reconciling {"request": {"name":"default","namespace":"openshift-ingress-operator"}} 2025-01-22T11:08:45.357Z ERROR operator.ingress_controller controller/controller.go:116 got retryable error; requeueing {"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing. Last 1 error messages:\nerror sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hcp418-rack03-hub.apps.rackm03.mydomain.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers) (x83 over 1h22m0s))"} 2025-01-22T11:08:49.767Z ERROR operator.canary_controller wait/backoff.go:226 error performing canary route check {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hcp418-rack03-hub.apps.rackm03.mydomain.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
All the nodes are up for the hcp cluster---
oc get nodes
NAME STATUS ROLES AGE VERSION
hcp418-rack03-hub-zblvr-95g4m Ready worker 5d20h v1.31.3
hcp418-rack03-hub-zblvr-btnsl Ready worker 5d20h v1.31.3
hcp418-rack03-hub-zblvr-k2lt2 Ready worker 5d20h v1.31.3
>oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
console 4.18.0-rc.1 False False True 25h RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hcp418-rack03-hub.apps.rackm03.mydomain.com): Get "https://console-openshift-console.apps.hcp418-rack03-hub.apps.rackm03.mydomain.com": EOF
csi-snapshot-controller 4.18.0-rc.1 True False False 94m
dns 4.18.0-rc.1 True False False 86m
image-registry 4.18.0-rc.1 True False False 86m
ingress 4.18.0-rc.1 True False True 86m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing. Last 1 error messages:...
insights 4.18.0-rc.1 True False False 3h24m
kube-apiserver 4.18.0-rc.1 True False False 12d
kube-controller-manager 4.18.0-rc.1 True False False 12d
kube-scheduler 4.18.0-rc.1 True False False 12d
kube-storage-version-migrator 4.18.0-rc.1 True False False 86m
monitoring 4.18.0-rc.1 False True True 70m UpdatingMetricsServer: reconciling MetricsServer Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/metrics-server: context deadline exceeded: got 2 unavailable replicas
network 4.18.0-rc.1 True False False 12d
node-tuning 4.18.0-rc.1 True False False 88m
openshift-apiserver 4.18.0-rc.1 True False False 12d
openshift-controller-manager 4.18.0-rc.1 True False False 12d
openshift-samples 4.18.0-rc.1 True False False 12d
operator-lifecycle-manager 4.18.0-rc.1 True False False 12d
operator-lifecycle-manager-catalog 4.18.0-rc.1 True False False 12d
operator-lifecycle-manager-packageserver 4.18.0-rc.1 True False False 12d
service-ca 4.18.0-rc.1 True False False 12d
storage 4.18.0-rc.1 True False False 12d
The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):
HCI rack with fusion operator provider cluster
The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc): Provider
The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):
OCP: 4.18.0-rc.1
FDF : 4.18.0-100
rack: rackm03
hcp cluster: hcp418-rack03-hub
Does this issue impact your ability to continue to work with the product? yes
Is there any workaround available to the best of your knowledge?
Yes creating a new hcp cluster and bringing it up as ACM hub.
Can this issue be reproduced? If so, please provide the hit rate
Yes it has appeared earlier, but can not replicate it
Can this issue be reproduced from the UI? Yes
If this is a regression, please provide more details to justify this:
The exact date and time when the issue was observed, including timezone details:
Observed from Tuesday, 21st Jan 2025 morning
Logs collected and log location:
Additional info:
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Create Provider cluster of HCI fusion rack
2. Create a hcp cluster
Actual results:
Unable to access web console for hcp cluster which is used as ACM hub. Below mentioned error messages are displayed.
Expected results:
Should be able to access the webconsole for the hcp clusters.
Additional info:
Attached must gather: http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/OCPBUGS-48733/