Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42935

Errors when the image registry is configured to use a custom Azure storage account located in a different resource group blocked the upgrade

XMLWordPrintable

    • Critical
    • None
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when you configured the image registry to use an {azure-first} storage account that was located in a resource group other than the cluster's resource group, the Image Registry Operator would become degraded. This occurred because of a validation error. With this release, an update to the Operator allows for authentication only by using a storage account key. Validation of other authentication requirements is not required. (link:https://issues.redhat.com/browse/OCPBUGS-42935[*OCPBUGS-42935*])
      Show
      * Previously, when you configured the image registry to use an {azure-first} storage account that was located in a resource group other than the cluster's resource group, the Image Registry Operator would become degraded. This occurred because of a validation error. With this release, an update to the Operator allows for authentication only by using a storage account key. Validation of other authentication requirements is not required. (link: https://issues.redhat.com/browse/OCPBUGS-42935 [* OCPBUGS-42935 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-42934. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-42933. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-42812. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-42514. The following is the description of the original issue:

      Description of problem:

      When configuring the OpenShift image registry to use a custom Azure storage account in a different resource group, following the official documentation [1], the image-registy CO degrade and upgrade from version 4.14.x to 4.15.x fails. The image registry operator reports misconfiguration errors related to Azure storage credentials, preventing the upgrade and causing instability in the control plane.

      [1] Configuring registry storage in Azure user infrastructure

      Version-Release number of selected component (if applicable):

         4.14.33, 4.15.33

      How reproducible:

      1. Set up ARO:
        • Deploy an ARO or OpenShift cluster on Azure, version 4.14.x.
      1. Configure Image Registry:
        • Follow the official documentation [1] to configure the image registry to use a custom Azure storage account located in a different resource group.
        • Ensure that the image-registry-private-configuration-user secret is created in the openshift-image-registry namespace.
        • Do not modify the installer-cloud-credentials secret.
      1. Check the image registry CO status
      2. Initiate Upgrade:
        • Attempt to upgrade the cluster to OpenShift version 4.15.x.

      Steps to Reproduce:

      1. If we have the image-registry-private-configuration-user inplace and installer-cloud-credentials with no modified

      We got the error 

          NodeCADaemonProgressing: The daemon set node-ca is deployed Progressing: Unable to apply resources: unable to sync storage configuration: client misconfigured, missing 'TenantID', 'ClientID', 'ClientSecret', 'FederatedTokenFile', 'Creds', 'SubscriptionID' option(s) 

      The oeprator will also genreate a new secret image-registry-private-configuration with the same content as image-registry-private-configuration-user

      $ oc get secret  image-registry-private-configuration -o yaml
      apiVersion: v1
      data:
        REGISTRY_STORAGE_AZURE_ACCOUNTKEY: xxxxxxxxxxxxxxxxx
      kind: Secret
      metadata:
        annotations:
          imageregistry.operator.openshift.io/checksum: sha256:524fab8dd71302f1a9ade9b152b3f9576edb2b670752e1bae1cb49b4de992eee
        creationTimestamp: "2024-09-26T19:52:17Z"
        name: image-registry-private-configuration
        namespace: openshift-image-registry
        resourceVersion: "126426"
        uid: e2064353-2511-4666-bd43-29dd020573fe
      type: Opaque 

       

      2. then we delete the secret image-registry-private-configuration-user

      now the secret image-registry-private-configuration will still exisit with the same content, but image-registry CO got a new error 

       

      NodeCADaemonProgressing: The daemon set node-ca is deployed Progressing: Unable to apply resources: unable to sync storage configuration: failed to get keys for the storage account arojudesa: storage.AccountsClient#ListKeys: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Storage/storageAccounts/arojudesa' under resource group 'aro-ufjvmbl1' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix" 

      3. apply the workaround to manually changeing the secret installer-cloud-credentials azure_resourcegroup key with custom storage account resourcegroup

      $ oc get secret installer-cloud-credentials -o yaml
      apiVersion: v1
      data:
        azure_client_id: xxxxxxxxxxxxxxxxx
        azure_client_secret: xxxxxxxxxxxxxxxxx
        azure_region: xxxxxxxxxxxxxxxxx
        azure_resource_prefix: xxxxxxxxxxxxxxxxx
        azure_resourcegroup: xxxxxxxxxxxxxxxxx <<<<<-----THIS
        azure_subscription_id: xxxxxxxxxxxxxxxxx
        azure_tenant_id: xxxxxxxxxxxxxxxxx
      kind: Secret
      metadata:
        annotations:
          cloudcredential.openshift.io/credentials-request: openshift-cloud-credential-operator/openshift-image-registry-azure
        creationTimestamp: "2024-09-26T16:49:57Z"
        labels:
          cloudcredential.openshift.io/credentials-request: "true"
        name: installer-cloud-credentials
        namespace: openshift-image-registry
        resourceVersion: "133921"
        uid: d1268e2c-1825-49f0-aa44-d0e1cbcda383
      type: Opaque 

       

      The  image-registry report healthy and this help the continue the upgrade

       

      Actual results:

          The image registry seems still use the service principal way for Azure storage account authentication

      Expected results:

          We expect the REGISTRY_STORAGE_AZURE_ACCOUNTKEY should the only thing image registry operator need for storage account authentication if Customer provide 
      • The image registry continues to function using the custom Azure storage account in the different resource group.

      Additional info:

      • Reproducibility: The issue is consistently reproducible by following the official documentation to configure the image registry with a custom storage account in a different resource group and then attempting an upgrade.
      • Related Issues:
        • Similar problems have been reported in previous incidents, suggesting a systemic issue with the image registry operator's handling of Azure storage credentials.
      • Critical Customer Impact: Customers are required to perform manual interventions after every upgrade for each cluster, which is not sustainable and leads to operational overhead.

       

      Slack : https://redhat-internal.slack.com/archives/CCV9YF9PD/p1727379313014789

              fmissi Flavian Missi
              openshift-crt-jira-prow OpenShift Prow Bot
              XiuJuan Wang XiuJuan Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: