Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32424

Un-managing a cluster from GitOps ZTP when at scale takes too long

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      While preparing to do IBU scale tests of 3500+ SNOs in the ACM ZTP Large Scale Environment, I have un-managed a cluster in order to create a seed image from the cluster. The cluster's namespace is still in terminating state despite having been in that state for 25 minutes now. In smaller environments the un-managing occurs in ~3 minutes. I would expect despite being in a large environment for the timeframe to do the same task to take around the same period of time.

      Version-Release number of selected component (if applicable):

      Hub OCP 4.15.9
      Deployed OCP 4.15.0
      ACM - 2.10.0-DOWNSTREAM-2024-03-14-14-53-38

      How reproducible:

      100% in this environment

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

      It appears that the namespace deletion is hung on an ACM OBS component:
      
      # oc describe ns vm00001
      Name:         vm00001
      Labels:       app.kubernetes.io/instance=ztp-clusters-04
                    cluster.open-cluster-management.io/managedCluster=vm00001
                    kubernetes.io/metadata.name=vm00001
                    name=vm00001
                    open-cluster-management.io/cluster-name=vm00001
                    pod-security.kubernetes.io/audit=restricted
                    pod-security.kubernetes.io/audit-version=v1.24
                    pod-security.kubernetes.io/warn=restricted
                    pod-security.kubernetes.io/warn-version=v1.24
      Annotations:  argocd.argoproj.io/sync-wave: 0
                    openshift.io/sa.scc.mcs: s0:c106,c90
                    openshift.io/sa.scc.supplemental-groups: 1011310000/10000
                    openshift.io/sa.scc.uid-range: 1011310000/10000
                    ran.openshift.io/ztp-gitops-generated: {}
      Status:       Terminating
      Conditions:
        Type                                         Status  LastTransitionTime               Reason                Message
        ----                                         ------  ------------------               ------                -------
        NamespaceDeletionDiscoveryFailure            False   Thu, 18 Apr 2024 13:27:19 +0000  ResourcesDiscovered   All resources successfully discovered
        NamespaceDeletionGroupVersionParsingFailure  False   Thu, 18 Apr 2024 13:27:19 +0000  ParsedGroupVersions   All legacy kube types successfully parsed
        NamespaceDeletionContentFailure              False   Thu, 18 Apr 2024 13:27:19 +0000  ContentDeleted        All content successfully deleted, may be waiting on finalization
        NamespaceContentRemaining                    True    Thu, 18 Apr 2024 13:27:19 +0000  SomeResourcesRemain   Some resources are remaining: observabilityaddons.observability.open-cluster-management.io has 1 resource instances
        NamespaceFinalizersRemaining                 True    Thu, 18 Apr 2024 13:27:19 +0000  SomeFinalizersRemain  Some content in the namespace has finalizers remaining: observability.open-cluster-management.io/addon-cleanup in 1 resource instancesNo resource quota.No LimitRange resource.
      

            jche@redhat.com Jun Chen
            akrzos@redhat.com Alex Krzos
            Yang Liu Yang Liu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: