Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29755

image-registry co is degraded on Azure MAG, Azure Stack Hub cloud or with azure workload identity

    • Critical
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-29638. The following is the description of the original issue:
      โ€”
      Description of problem:

      Install IPI cluster against 4.15 nightly build on Azure MAG and Azure Stack Hub or with Azure workload identity, image-registry co is degraded with different errors.
      
      On MAG:
      $ oc get co image-registry
      NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      image-registry   4.15.0-0.nightly-2024-02-16-235514   True        False         True       5h44m   AzurePathFixControllerDegraded: Migration failed: panic: Get "https://imageregistryjima41xvvww.blob.core.windows.net/jima415a-hfxfh-image-registry-vbibdmawmsvqckhvmmiwisebryohfbtm?comp=list&prefix=docker&restype=container": dial tcp: lookup imageregistryjima41xvvww.blob.core.windows.net on 172.30.0.10:53: no such host...
      
      $ oc get pod -n openshift-image-registry
      NAME                                               READY   STATUS    RESTARTS        AGE
      azure-path-fix-ssn5w                               0/1     Error     0               5h47m
      cluster-image-registry-operator-86cdf775c7-7brn6   1/1     Running   1 (5h50m ago)   5h58m
      image-registry-5c6796b86d-46lvx                    1/1     Running   0               5h47m
      image-registry-5c6796b86d-9st5d                    1/1     Running   0               5h47m
      node-ca-48lsh                                      1/1     Running   0               5h44m
      node-ca-5rrsl                                      1/1     Running   0               5h47m
      node-ca-8sc92                                      1/1     Running   0               5h47m
      node-ca-h6trz                                      1/1     Running   0               5h47m
      node-ca-hm7s2                                      1/1     Running   0               5h47m
      node-ca-z7tv8                                      1/1     Running   0               5h44m
      
      $ oc logs azure-path-fix-ssn5w -n openshift-image-registry
      panic: Get "https://imageregistryjima41xvvww.blob.core.windows.net/jima415a-hfxfh-image-registry-vbibdmawmsvqckhvmmiwisebryohfbtm?comp=list&prefix=docker&restype=container": dial tcp: lookup imageregistryjima41xvvww.blob.core.windows.net on 172.30.0.10:53: no such hostgoroutine 1 [running]:
      main.main()
          /go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:49 +0x125
      
      The blob storage endpoint seems not correct, should be:
      $ az storage account show -n imageregistryjima41xvvww -g jima415a-hfxfh-rg --query primaryEndpoints
      {
        "blob": "https://imageregistryjima41xvvww.blob.core.usgovcloudapi.net/",
        "dfs": "https://imageregistryjima41xvvww.dfs.core.usgovcloudapi.net/",
        "file": "https://imageregistryjima41xvvww.file.core.usgovcloudapi.net/",
        "internetEndpoints": null,
        "microsoftEndpoints": null,
        "queue": "https://imageregistryjima41xvvww.queue.core.usgovcloudapi.net/",
        "table": "https://imageregistryjima41xvvww.table.core.usgovcloudapi.net/",
        "web": "https://imageregistryjima41xvvww.z2.web.core.usgovcloudapi.net/"
      }
      
      On Azure Stack Hub:
      $ oc get co image-registry
      NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      image-registry   4.15.0-0.nightly-2024-02-16-235514   True        False         True       3h32m   AzurePathFixControllerDegraded: Migration failed: panic: open : no such file or directory...
      
      $ oc get pod -n openshift-image-registry
      NAME                                               READY   STATUS    RESTARTS        AGE
      azure-path-fix-8jdg7                               0/1     Error     0               3h35m
      cluster-image-registry-operator-86cdf775c7-jwnd4   1/1     Running   1 (3h38m ago)   3h54m
      image-registry-658669fbb4-llv8z                    1/1     Running   0               3h35m
      image-registry-658669fbb4-lmfr6                    1/1     Running   0               3h35m
      node-ca-2jkjx                                      1/1     Running   0               3h35m
      node-ca-dcg2v                                      1/1     Running   0               3h35m
      node-ca-q6xmn                                      1/1     Running   0               3h35m
      node-ca-r46r2                                      1/1     Running   0               3h35m
      node-ca-s8jkb                                      1/1     Running   0               3h35m
      node-ca-ww6ql                                      1/1     Running   0               3h35m
      
      $ oc logs azure-path-fix-8jdg7 -n openshift-image-registry
      panic: open : no such file or directorygoroutine 1 [running]:
      main.main()
          /go/src/github.com/openshift/cluster-image-registry-operator/cmd/move-blobs/main.go:36 +0x145
      
      On cluster with Azure workload identity:
      Some operator's PROGRESSING is True
      image-registry                             4.15.0-0.nightly-2024-02-16-235514   True        True          False      43m     Progressing: The deployment has not completed...
      
      pod azure-path-fix is in CreateContainerConfigError status, and get error in its Event.
      
      "state": {
          "waiting": {
              "message": "couldn't find key REGISTRY_STORAGE_AZURE_ACCOUNTKEY in Secret openshift-image-registry/image-registry-private-configuration",
              "reason": "CreateContainerConfigError"
          }
      }                

      Version-Release number of selected component (if applicable):

      4.15.0-0.nightly-2024-02-16-235514    

      How reproducible:

          Always

      Steps to Reproduce:

          1. Install IPI cluster on MAG or Azure Stack Hub or config Azure workload identity
          2.
          3.
          

      Actual results:

          Installation failed and image-registry operator is degraded

      Expected results:

          Installation is successful.

      Additional info:

          Seems that issue is related with https://github.com/openshift/image-registry/pull/393

            [OCPBUGS-29755] image-registry co is degraded on Azure MAG, Azure Stack Hub cloud or with azure workload identity

            Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which was already closed at the time of the bulk update) had Priority = "Blocker." It is being updated to Priority = Critical. No additional fields were changed.

            OpenShift Jira Automation Bot added a comment - Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which was already closed at the time of the bulk update) had Priority = "Blocker." It is being updated to Priority = Critical. No additional fields were changed.

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (OpenShift Container Platform 4.14.15 bug fix update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHBA-2024:1046

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (OpenShift Container Platform 4.14.15 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:1046

            I'm removing  UpgradeBlocker as described here.  And even when we set the label, we usually only set it on the tip-most bug in the series, because we want a single impact statement (in this case, IR-461) to explain the whole fleet exposure, not just a single 4.y series' exposure.

            W. Trevor King added a comment - I'm removing   UpgradeBlocker as described here.  And even when we set the label, we usually only set it on the tip-most bug in the series, because we want a single impact statement (in this case, IR-461 ) to explain the whole fleet exposure, not just a single 4.y series' exposure.

            Hi fmissi,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi fmissi , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            Wen Wang added a comment -

            Tested in version: 4.14.0-0.test-2024-02-21-131627-ci-ln-n0hpy3k-latest
            clusters are installed successfully
            1. http_proxy job
            2. workload identity job
            3. disk-encryption job
            4. MAG-fully_private_cluster-NAT job

            And upgrade from 4.13.33 to 4.14 custom build successfully:
            [root@wewang-thinkpadt14sgen2i ~]# oc get pods -n wewang
            NAME READY STATUS RESTARTS AGE
            new-default-deploy1-548c9d9fc7-47xkq 1/1 Running 0 53m
            [root@wewang-thinkpadt14sgen2i ~]# podman pull --tls-verify=false ${REGISTRY}/wewang/hello:latest
            Trying to pull default-route-openshift-image-registry.apps.wewang-413b.qemag.azure.devcluster.openshift.com/wewang/hello:latest...
            Getting image source signatures
            Copying blob b8c72134a16a [======================>---------------] 18.2MiB / 30.2MiB | 22.4 KiB/s
            Copying blob b8c72134a16a [======================>---------------] 18.4MiB / 30.2MiB | 32.9 KiB/s
            Copying blob b8c72134a16a done |
            Copying config fc87538605 done |
            Writing manifest to image destination
            fc875386057e8d1f478f11b00cfdad723ec49845b4aacd8c4b2588aba4ef715f

            [root@wewang-thinkpadt14sgen2i ~]# oc get clusterversion
            NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
            version 4.14.0-0.test-2024-02-21-131627-ci-ln-n0hpy3k-latest True False 51m Cluster version is 4.14.0-0.test-2024-02-21-131627-ci-ln-n0hpy3k-latest

            finished pre-merge test and add label qe-approved

            Wen Wang added a comment - Tested in version: 4.14.0-0.test-2024-02-21-131627-ci-ln-n0hpy3k-latest clusters are installed successfully 1. http_proxy job 2. workload identity job 3. disk-encryption job 4. MAG-fully_private_cluster-NAT job And upgrade from 4.13.33 to 4.14 custom build successfully: [root@wewang-thinkpadt14sgen2i ~] # oc get pods -n wewang NAME READY STATUS RESTARTS AGE new-default-deploy1-548c9d9fc7-47xkq 1/1 Running 0 53m [root@wewang-thinkpadt14sgen2i ~] # podman pull --tls-verify=false ${REGISTRY}/wewang/hello:latest Trying to pull default-route-openshift-image-registry.apps.wewang-413b.qemag.azure.devcluster.openshift.com/wewang/hello:latest... Getting image source signatures Copying blob b8c72134a16a [======================>---------------] 18.2MiB / 30.2MiB | 22.4 KiB/s Copying blob b8c72134a16a [======================>---------------] 18.4MiB / 30.2MiB | 32.9 KiB/s Copying blob b8c72134a16a done | Copying config fc87538605 done | Writing manifest to image destination fc875386057e8d1f478f11b00cfdad723ec49845b4aacd8c4b2588aba4ef715f [root@wewang-thinkpadt14sgen2i ~] # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.test-2024-02-21-131627-ci-ln-n0hpy3k-latest True False 51m Cluster version is 4.14.0-0.test-2024-02-21-131627-ci-ln-n0hpy3k-latest finished pre-merge test and add label qe-approved

              fmissi Flavian Missi
              openshift-crt-jira-prow OpenShift Prow Bot
              Wen Wang Wen Wang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: