-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.16, 4.17, 4.18
Description of problem:
IBM Cloud CCM was reconfigured to use loopback as the bind address in 4.16. However, the liveness probe was not configured to use loopback too, so the CCM constantly fails the liveness probe and restarts continuously.
Version-Release number of selected component (if applicable):
4.17
How reproducible:
100%
Steps to Reproduce:
1. Create a IPI cluster on IBM Cloud 2. Watch the IBM Cloud CCM pod and restarts, increase every 5 mins (liveness probe timeout)
Actual results:
# oc --kubeconfig cluster-deploys/eu-de-4.17-rc2-3/auth/kubeconfig get po -n openshift-cloud-controller-manager NAME READY STATUS RESTARTS AGE ibm-cloud-controller-manager-58f7747d75-j82z8 0/1 CrashLoopBackOff 262 (39s ago) 23h ibm-cloud-controller-manager-58f7747d75-l7mpk 0/1 CrashLoopBackOff 261 (2m30s ago) 23h Normal Killing 34m (x2 over 40m) kubelet Container cloud-controller-manager failed liveness probe, will be restarted Normal Pulled 34m (x2 over 40m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5ac9fb24a0e051aba6b16a1f9b4b3f9d2dd98f33554844953dd4d1e504fb301e" already present on machine Normal Created 34m (x3 over 45m) kubelet Created container cloud-controller-manager Normal Started 34m (x3 over 45m) kubelet Started container cloud-controller-manager Warning Unhealthy 29m (x8 over 40m) kubelet Liveness probe failed: Get "https://10.242.129.4:10258/healthz": dial tcp 10.242.129.4:10258: connect: connection refused Warning ProbeError 3m4s (x22 over 40m) kubelet Liveness probe error: Get "https://10.242.129.4:10258/healthz": dial tcp 10.242.129.4:10258: connect: connection refused body:
Expected results:
CCM runs continuously, as it does on 4.15 # oc --kubeconfig cluster-deploys/eu-de-4.15.10-1/auth/kubeconfig get po -n openshift-cloud-controller-manager NAME READY STATUS RESTARTS AGE ibm-cloud-controller-manager-66d4779cb8-gv8d4 1/1 Running 0 63m ibm-cloud-controller-manager-66d4779cb8-pxdrs 1/1 Running 0 63m
Additional info:
IBM Cloud have a PR open to fix the liveness probe. https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/360
- blocks
-
OCPBUGS-41941 [IBMCloud] CCM liveness probe in failure loop
- Closed
- is cloned by
-
OCPBUGS-41941 [IBMCloud] CCM liveness probe in failure loop
- Closed
- links to
-
RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update