Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-41936

[IBMCloud] CCM liveness probe in failure loop

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None
    • Fix IBM Cloud CCM liveness probe configuration to use loopback for request host.
    • Bug Fix
    • In Progress

      Description of problem:

      IBM Cloud CCM was reconfigured to use loopback as the bind address in 4.16. However, the liveness probe was not configured to use loopback too, so the CCM constantly fails the liveness probe and restarts continuously.    

      Version-Release number of selected component (if applicable):

          4.17

      How reproducible:

          100%

      Steps to Reproduce:

          1. Create a IPI cluster on IBM Cloud
          2. Watch the IBM Cloud CCM pod and restarts, increase every 5 mins (liveness probe timeout)
          

      Actual results:

          # oc --kubeconfig cluster-deploys/eu-de-4.17-rc2-3/auth/kubeconfig get po -n openshift-cloud-controller-manager
      NAME                                            READY   STATUS             RESTARTS          AGE
      ibm-cloud-controller-manager-58f7747d75-j82z8   0/1     CrashLoopBackOff   262 (39s ago)     23h
      ibm-cloud-controller-manager-58f7747d75-l7mpk   0/1     CrashLoopBackOff   261 (2m30s ago)   23h
      
      
      
        Normal   Killing     34m (x2 over 40m)    kubelet            Container cloud-controller-manager failed liveness probe, will be restarted
        Normal   Pulled      34m (x2 over 40m)    kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5ac9fb24a0e051aba6b16a1f9b4b3f9d2dd98f33554844953dd4d1e504fb301e" already present on machine
        Normal   Created     34m (x3 over 45m)    kubelet            Created container cloud-controller-manager
        Normal   Started     34m (x3 over 45m)    kubelet            Started container cloud-controller-manager
        Warning  Unhealthy   29m (x8 over 40m)    kubelet            Liveness probe failed: Get "https://10.242.129.4:10258/healthz": dial tcp 10.242.129.4:10258: connect: connection refused
        Warning  ProbeError  3m4s (x22 over 40m)  kubelet            Liveness probe error: Get "https://10.242.129.4:10258/healthz": dial tcp 10.242.129.4:10258: connect: connection refused
      body:

      Expected results:

          CCM runs continuously, as it does on 4.15
      
      # oc --kubeconfig cluster-deploys/eu-de-4.15.10-1/auth/kubeconfig get po -n openshift-cloud-controller-manager
      NAME                                            READY   STATUS    RESTARTS   AGE
      ibm-cloud-controller-manager-66d4779cb8-gv8d4   1/1     Running   0          63m
      ibm-cloud-controller-manager-66d4779cb8-pxdrs   1/1     Running   0          63m

      Additional info:

          IBM Cloud have a PR open to fix the liveness probe.
      https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/360

              jeffbnowicki Jeff Nowicki
              cschaefe@redhat.com Christopher Schaefer
              Zhaohua Sun Zhaohua Sun
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: