Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43350

Image registry operator becomes degraded when setting management state to Removed when networkAccess is set to Internal

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.18.0
    • Image Registry
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, if an image registry Operator was configured with the`networkAccess` field set to `Internal` in Azure, an authorization error prevented the image registry Operator from deleting the storage container, as well as the `managementState` field from being set to `Removed`. With this release, the Operator can delete the storage account and storage container, and the `managementState` field can successfully set to `Removed`. (link:https://issues.redhat.com/browse/OCPBUGS-43350[*OCPBUGS-43350*])
      --------
      Previously, when the image registry operator was configured with "networkAccess: Internal" in Azure, it would not be possible to successfully set "managementState" to "Removed" in the operator configuration due to an authorization error when the operator tried to delete the storage container. This update makes the operator continue with the deletion of the storage account, which automatically deletes the storage container, resulting in a successful change into "Removed" state.
      Show
      * Previously, if an image registry Operator was configured with the`networkAccess` field set to `Internal` in Azure, an authorization error prevented the image registry Operator from deleting the storage container, as well as the `managementState` field from being set to `Removed`. With this release, the Operator can delete the storage account and storage container, and the `managementState` field can successfully set to `Removed`. (link: https://issues.redhat.com/browse/OCPBUGS-43350 [* OCPBUGS-43350 *]) -------- Previously, when the image registry operator was configured with "networkAccess: Internal" in Azure, it would not be possible to successfully set "managementState" to "Removed" in the operator configuration due to an authorization error when the operator tried to delete the storage container. This update makes the operator continue with the deletion of the storage account, which automatically deletes the storage container, resulting in a successful change into "Removed" state.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-42732. The following is the description of the original issue:

      Description of problem:

          The operator cannot succeed removing resources when networkAccess is set to Removed.
          It looks like the authorization error changes from bloberror.AuthorizationPermissionMismatch to bloberror.AuthorizationFailure after the storage account becomes private (networkAccess: Internal).
          This is either caused by weird behavior in the azure sdk, or in the azure api itself.
          The easiest way to solve it is to also handle bloberror.AuthorizationFailure here: https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/storage/azure/azure.go?plain=1#L1145
      
          The error condition is the following:
      
      status:
        conditions:
        - lastTransitionTime: "2024-09-27T09:04:20Z"
          message: "Unable to delete storage container: DELETE https://imageregistrywxj927q6bpj.blob.core.windows.net/wxj-927d-jv8fc-image-registry-rwccleepmieiyukdxbhasjyvklsshhee\n--------------------------------------------------------------------------------\nRESPONSE
            403: 403 This request is not authorized to perform this operation.\nERROR CODE:
            AuthorizationFailure\n--------------------------------------------------------------------------------\n\uFEFF<?xml
            version=\"1.0\" encoding=\"utf-8\"?><Error><Code>AuthorizationFailure</Code><Message>This
            request is not authorized to perform this operation.\nRequestId:ababfe86-301e-0005-73bd-10d7af000000\nTime:2024-09-27T09:10:46.1231255Z</Message></Error>\n--------------------------------------------------------------------------------\n"
          reason: AzureError
          status: Unknown
          type: StorageExists
        - lastTransitionTime: "2024-09-27T09:02:26Z"
          message: The registry is removed
          reason: Removed
          status: "True"
          type: Available 

      Version-Release number of selected component (if applicable):

          4.18, 4.17, 4.16 (needs confirmation), 4.15 (needs confirmation)

      How reproducible:

          Always

      Steps to Reproduce:

          1. Get an Azure cluster
          2. In the operator config, set networkAccess to Internal
          3. Wait until the operator reconciles the change (watch networkAccess in status with `oc get configs.imageregistry/cluster -oyaml |yq '.status.storage'`)
          4. In the operator config, set management state to removed: `oc patch configs.imageregistry/cluster -p '{"spec":{"managementState":"Removed"}}' --type=merge`
          5. Watch the cluster operator conditions for the error

      Actual results:

          

      Expected results:

          

      Additional info:

          

              fmissi Flavian Missi
              openshift-crt-jira-prow OpenShift Prow Bot
              XiuJuan Wang XiuJuan Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: