Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63129

Pods in init and crashloop state for controlplane components if hosting cluster node is rebooted.

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-63128. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-61829. The following is the description of the original issue:

      Description of problem:

      When a Hosting Cluster (OpenShift cluster running HyperShift control planes) undergoes a restart, the Hosted Clusters experience issues with its Ingress functionality. After the Hosting Cluster is back online, the Hosted Clusters remain partially degraded until manual intervention is performed.
      
      The issue specifically manifests as stuck or unhealthy pods in the Hosted Clusters' control plane namespaces, preventing proper recovery.
      
      To restore functionality, it is necessary to manually delete the failing pods in the affected Hosted Clusters namespace, forcing Kubernetes to reschedule them. This workaround is consistently required after a Hosting Cluster reboot.
          

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Always    

      Steps to Reproduce:

      1. Deploy a Hosting Cluster running HyperShift.
      2. Configure hosting inventory for agent nodes (baremetal)
      3. Deploy one or more Hosted Clusters using the Hosting Cluster.
      4. Restart the Hosting Cluster nodes (full cluster reboot).
      5. Observe the state of the Hosted Clusters after the Hosting Cluster becomes available again.     

      Actual results:

      - Hosted Cluster control plane namespaces contain pods in non-Running states (e.g., CrashLoopBackOff, Error, or Pending).
      - Ingress operator for the Hosted Clusters does not function until corrective actions are taken.  

      Expected results:

      - Hosted Clusters should fully recover after the Hosting Cluster is restarted, without requiring manual intervention.    

      Additional info:

      # oc -n hc-cluster-ns get all | egrep -i "ingress-operator|openshift-apiserver"
      Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
      Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
      pod/ingress-operator-69dd78565c-dss2t                     0/2     Init:0/1                1              19h
      pod/openshift-apiserver-77c88f8cc-hddsp                   0/3     Init:CrashLoopBackOff   10 (49s ago)   19h
      service/openshift-apiserver                  ClusterIP      172.30.xxx.xxx   <none>          443/TCP             8d
      deployment.apps/ingress-operator                     0/1     1            0           8d
      deployment.apps/openshift-apiserver                  0/1     1            0           8d
      replicaset.apps/ingress-operator-69dd78565c                     1         1         0       8d
      replicaset.apps/openshift-apiserver-77c88f8cc                   1         1         0       8d
      replicaset.apps/openshift-apiserver-7c75bb4754                  0         0         0       8d

       

              rhn-support-liangli Liangquan Li
              rhn-support-vdurgam Vedant Durgam
              None
              Ke Wang
              Yu Li Yu Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: