Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-26412

CPO Failing to delete default worker security group, but not reflected in HostedCluster status condition

XMLWordPrintable

    • Moderate
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, if uninstallation of a hosted cluster is stuck, status of the Control Plane Operator (CPO) was reported incorrectly. With this update, the status of the CPO is reported correctly. (link:https://issues.redhat.com/browse/OCPBUGS-26412[*OCPBUGS-26412*])

      Show
      * Previously, if uninstallation of a hosted cluster is stuck, status of the Control Plane Operator (CPO) was reported incorrectly. With this update, the status of the CPO is reported correctly. (link: https://issues.redhat.com/browse/OCPBUGS-26412 [* OCPBUGS-26412 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-23362. The following is the description of the original issue:

      A hostedcluster/hostedcontrolplane were stuck uninstalling. Inspecting the CPO logs, it showed that
      
      "error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"
      
      Unfortunately, I do not have enough access to the AWS account to inspect this security group, though I know it is the default worker security group because it's recorded in the hostedcluster .status.platform.aws.defaultWorkerSecurityGroupID

      Version-Release number of selected component (if applicable):

      4.14.1

      How reproducible:

      I haven't tried to reproduce it yet, but can do so and update this ticket when I do. My theory is:

      Steps to Reproduce:

      1. Create an AWS HostedCluster, wait for it to create/populate defaultWorkerSecurityGroupID
      2. Attach the defaultWorkerSecurityGroupID to anything else in the AWS account unrelated to the HCP cluster
      3. Attempt to delete the HostedCluster
      

      Actual results:

      CPO logs:
      "error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"
      HostedCluster Status Condition
        - lastTransitionTime: "2023-11-09T22:18:09Z"
          message: ""
          observedGeneration: 3
          reason: StatusUnknown
          status: Unknown
          type: CloudResourcesDestroyed

      Expected results:

      I would expect that the CloudResourcesDestroyed status condition on the hostedcluster would reflect this security group as holding up the deletion instead of having to parse through logs.

      Additional info:

       

            jparrill@redhat.com Juan Manuel Parrilla Madrid
            openshift-crt-jira-prow OpenShift Prow Bot
            Jie Zhao Jie Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: