Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43535

Two pods fail on the Hypershift Management cluster after restarting the control plane nodes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.17.z
    • HyperShift
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • Hypershift Sprint 262, Hypershift Sprint 263
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Description of problem:

      After the restart of the management cluster nodes, 2 pods fail.

      # oc get po -A | grep -v "Completed\|Running"
      NAMESPACE                                          NAME                                                              READY   STATUS                  RESTARTS         AGE
      clusters-hypershift-001                            ingress-operator-5b954659b4-9s5ht                                 0/2     Init:0/1                1                5h35m
      clusters-hypershift-001                            openshift-apiserver-7bdfd9f969-hrwj2                              0/3     Init:CrashLoopBackOff   13 (4m57s ago)   3h13m 

       

      As a workaround, after deleting the `openshift-apiserver` pod, all pods come up. At times, it leads to a pod failure on the hosted cluster

       

      # oc get po -A | grep -v "Completed\|Running"
      NAMESPACE                                          NAME                                                                  READY   STATUS    RESTARTS         AGE
      openshift-image-registry                           image-pruner-28818720-qnkqr                                           0/1     Error     0                8h
      
      # oc logs image-pruner-28818720-qnkqr -n openshift-image-registry
      Error from server (ServiceUnavailable): the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
      
      # oc get co
      image-registry                             4.17.1    True        False         True       23h     ImagePrunerDegraded: Job has reached the specified backoff limit 

       

      It passes in the next iteration of the cronjob.

      # oc get po -A | grep image
      openshift-image-registry                           image-pruner-28820160-9k4hd                                           0/1     Completed   0               4h56m
      
      # oc logs image-pruner-28820160-9k4hd -n openshift-image-registry
      Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
      I1018 00:00:06.440095       7 prune.go:348] Creating image pruner with keepYoungerThan=1h0m0s, keepTagRevisions=3, pruneOverSizeLimit=<nil>, allImages=true
      Summary: deleted 0 objects
      
      # oc get co
      image-registry                             4.17.1    True        False         False      44h 

       

       

       

       

      Version-Release number of selected component (if applicable):

      4.17.1

      How reproducible:

      Always

      Steps to Reproduce:

      Restart the control plane nodes of the management cluster    

       

      Actual results:

      After the restart of the management cluster nodes, 2 pods fail.

      # oc get po -A | grep -v "Completed\|Running"
      NAMESPACE                                          NAME                                                              READY   STATUS                  RESTARTS         AGE
      clusters-hypershift-001                            ingress-operator-5b954659b4-9s5ht                                 0/2     Init:0/1                1                5h35m
      clusters-hypershift-001                            openshift-apiserver-7bdfd9f969-hrwj2                              0/3     Init:CrashLoopBackOff   13 (4m57s ago)   3h13m

      Expected results:

      All pods should come up after restart

       

      Additional info:

      As a workaround, after deleting the `openshift-apiserver` pod, all pods come up. At times, it leads to a pod failure on the hosted cluster

      # oc get po -A | grep -v "Completed\|Running"
      NAMESPACE                                          NAME                                                                  READY   STATUS    RESTARTS         AGE
      openshift-image-registry                           image-pruner-28818720-qnkqr                                           0/1     Error     0                8h
      
      # oc logs image-pruner-28818720-qnkqr -n openshift-image-registry
      Error from server (ServiceUnavailable): the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
      
      # oc get co
      image-registry                             4.17.1    True        False         True       23h     ImagePrunerDegraded: Job has reached the specified backoff limit

      It passes in the next iteration of the cronjob.

      # oc get po -A | grep image
      openshift-image-registry                           image-pruner-28820160-9k4hd                                           0/1     Completed   0               4h56m
      
      # oc logs image-pruner-28820160-9k4hd -n openshift-image-registry
      Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
      I1018 00:00:06.440095       7 prune.go:348] Creating image pruner with keepYoungerThan=1h0m0s, keepTagRevisions=3, pruneOverSizeLimit=<nil>, allImages=true
      Summary: deleted 0 objects
      
      # oc get co
      image-registry                             4.17.1    True        False         False      44h

              Unassigned Unassigned
              aishwarya_kamat1 Aishwarya Kamat (Inactive)
              None
              None
              Elsa Passaro Elsa Passaro
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: