Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23362

CPO Failing to delete default worker security group, but not reflected in HostedCluster status condition

    XMLWordPrintable

Details

    • Moderate
    • No
    • Hypershift Sprint 246
    • 1
    • False
    • Hide

      None

      Show
      None

    Description

      A hostedcluster/hostedcontrolplane were stuck uninstalling. Inspecting the CPO logs, it showed that
      
      "error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"
      
      Unfortunately, I do not have enough access to the AWS account to inspect this security group, though I know it is the default worker security group because it's recorded in the hostedcluster .status.platform.aws.defaultWorkerSecurityGroupID

      Version-Release number of selected component (if applicable):

      4.14.1

      How reproducible:

      I haven't tried to reproduce it yet, but can do so and update this ticket when I do. My theory is:

      Steps to Reproduce:

      1. Create an AWS HostedCluster, wait for it to create/populate defaultWorkerSecurityGroupID
      2. Attach the defaultWorkerSecurityGroupID to anything else in the AWS account unrelated to the HCP cluster
      3. Attempt to delete the HostedCluster
      

      Actual results:

      CPO logs:
      "error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"
      HostedCluster Status Condition
        - lastTransitionTime: "2023-11-09T22:18:09Z"
          message: ""
          observedGeneration: 3
          reason: StatusUnknown
          status: Unknown
          type: CloudResourcesDestroyed

      Expected results:

      I would expect that the CloudResourcesDestroyed status condition on the hostedcluster would reflect this security group as holding up the deletion instead of having to parse through logs.

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              jparrill@redhat.com Juan Manuel Parrilla Madrid
              mshen.openshift Michael Shen
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: