Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-51173

Spoke Cluster become unhealthy and stuck in terminating phase with no resources.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Normal Normal
    • None
    • 4.18
    • LCA operator
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      While executing ran-rds-ibu (image based upgrade) pipeline it got stuck in "cleaning up pre-existing DU resources" with bellow error
      
      TASK [deploy-gitops-du : check if namespace exits] *****************************
      changed: [registry.ztp-hub-01.mobius.lab.eng.rdu2.redhat.com] => {"changed": true, "cmd": "oc -n kni-qe-2 get namespace kni-qe-2\n", "delta": "0:00:00.137324", "end": "2025-02-24 05:03:51.101147", "msg": "", "rc": 0, "start": "2025-02-24 05:03:50.963823", "stderr": "", "stderr_lines": [], "stdout": "NAME       STATUS        AGE\nkni-qe-2   Terminating   3d14h", "stdout_lines": ["NAME       STATUS        AGE", "kni-qe-2   Terminating   3d14h"]}
      
      TASK [deploy-gitops-du : cleanup namespace] ************************************
      
      While analysing further its seems that namespace kni-qe-2 got stuck in terminating state and no resources found further
      
      [kni@registry.ztp-hub-01 ~]$ oc describe namespace kni-qe-2
      Name:         kni-qe-2
      Labels:       app.kubernetes.io/instance=clusters
                    cluster.open-cluster-management.io/managedCluster=kni-qe-2
                    kubernetes.io/metadata.name=kni-qe-2
                    name=kni-qe-2
                    open-cluster-management.io/cluster-name=kni-qe-2
                    pod-security.kubernetes.io/audit=restricted
                    pod-security.kubernetes.io/audit-version=latest
                    pod-security.kubernetes.io/warn=restricted
                    pod-security.kubernetes.io/warn-version=latest
      Annotations:  argocd.argoproj.io/sync-wave: 0
                    openshift.io/sa.scc.mcs: s0:c30,c25
                    openshift.io/sa.scc.supplemental-groups: 1000920000/10000
                    openshift.io/sa.scc.uid-range: 1000920000/10000
                    ran.openshift.io/ztp-gitops-generated: {}
      Status:       Terminating
      Conditions:
        Type                                         Status  LastTransitionTime               Reason                Message
        ----                                         ------  ------------------               ------                -------
        NamespaceDeletionDiscoveryFailure            False   Thu, 20 Feb 2025 19:48:16 +0000  ResourcesDiscovered   All resources successfully discovered
        NamespaceDeletionGroupVersionParsingFailure  False   Thu, 20 Feb 2025 19:48:16 +0000  ParsedGroupVersions   All legacy kube types successfully parsed
        NamespaceDeletionContentFailure              False   Thu, 20 Feb 2025 19:48:16 +0000  ContentDeleted        All content successfully deleted, may be waiting on finalization
        NamespaceContentRemaining                    True    Thu, 20 Feb 2025 19:48:16 +0000  SomeResourcesRemain   Some resources are remaining: rolebindings.authorization.openshift.io has 1 resource instances, rolebindings.rbac.authorization.k8s.io has 1 resource instances
        NamespaceFinalizersRemaining                 True    Thu, 20 Feb 2025 19:48:16 +0000  SomeFinalizersRemain  Some content in the namespace has finalizers remaining: cluster.open-cluster-management.io/manifest-work-cleanup in 2 resource instancesNo resource quota.No LimitRange resource.
      [kni@registry.ztp-hub-01 ~]$
      [kni@registry.ztp-hub-01 ~]$
      

      Version-Release number of selected component (if applicable):

      4.18

      How reproducible:

      permanent issue on seed node

      Steps to Reproduce:

          1.run ran-rds-ibu.yaml pipeline to perform image based upgrade from seed and target 
          2.
          3.
          

      Actual results:

      kni-qe-2 namespace got stuck in terminating phase due to no resources hence cleanup resources phase got stuck

      Expected results:

      It should clean-up old resources and deploy new spoke cluster as per preocess

      Additional info:

       Pipeline still running :-
      https://jenkins-csb-kniqe-ci.dno.corp.redhat.com/job/CI/job/far-edge-vran-ibu/509/
      
      

              jche@redhat.com Jun Chen
              vkhokhar varun khokhar
              None
              None
              varun khokhar varun khokhar
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: