Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4090

OCP on OSP - Image registry is deployed with cinder instead of swift storage backend

    XMLWordPrintable

Details

    • ?
    • Important
    • ShiftStack Sprint 228, ShiftStack Sprint 229, ShiftStack Sprint 230
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      In order to be compatible with OpenStack clouds that don't have Swift installed, cluster-image-registry-operator has a mechanism for automatically choosing the storage back-end during the first boot. If Swift is available, Swift is used. Otherwise, a persistent volume claim is issued and block storage is used.

      Before this patch, cluster-image-registry-operator would fall back to using a PVC when it failed to reach Swift for any reason. In particular, lack of connectivity during the first boot would make CIRO fall back to using a PVC.

      With this change, failure to reach the OpenStack API, or other incidental failures, cause CIRO to retry the probe. The fallback to PVC will only occur if the OpenStack catalog is correctly found, and it does not contain object storage (or alternatively, if it is there and the current user does not have permission to list containers).
      Show
      In order to be compatible with OpenStack clouds that don't have Swift installed, cluster-image-registry-operator has a mechanism for automatically choosing the storage back-end during the first boot. If Swift is available, Swift is used. Otherwise, a persistent volume claim is issued and block storage is used. Before this patch, cluster-image-registry-operator would fall back to using a PVC when it failed to reach Swift for any reason. In particular, lack of connectivity during the first boot would make CIRO fall back to using a PVC. With this change, failure to reach the OpenStack API, or other incidental failures, cause CIRO to retry the probe. The fallback to PVC will only occur if the OpenStack catalog is correctly found, and it does not contain object storage (or alternatively, if it is there and the current user does not have permission to list containers).

    Description

      Description of problem:

      While deploying the OCP cluster on OSP 16, the installer was supposed to use swift as a storage backend for image-registry however it uses Cinder instead.
      
      This can be seen when installing an OCP 4.8 cluster on OSP 16. The swiftoperator role is already assigned to the OpenStack user.
      
      Below error was found in the registry operator logs related to "swift":
      ~~~
      $ oc logs cluster-image-registry-operator-7bbdcfb94c-lrqjs | grep swift
      E1116 09:43:06.539763       1 swift.go:67] swift storage inaccessible: Failed to authenticate provider client: Post "https://osp.ipz001.internal.bosch.cloud:13000/v3/auth/tokens": dial tcp 10.140.249.17:13000: connect: connection timed out
      ~~~
      
      Also when curled from the registry operator pod, it gets connected, So it seems it was just not ready at the time of installation. Now connectivity seems to be fine.
      
      ~~~
      sh-4.4$ curl -vk https://osp.ipz001.internal.bosch.cloud:13000/v3/auth/tokens
      
      * Uses proxy env variable NO_PROXY == '.bosch.com,.cluster.local,.svc,.webapp.inside.bosch.cloud,10.140.180.0/23,10.140.214.0/24,10.140.249.0/24,10.140.250.30,10.140.253.2,10.140.254.0/24,10.40.0.0/24,127.0.0.1,169.254.169.254,192.168.0.0/17,192.168.128.0/17,api-int.de1qua.osh.ipz001.internal.bosch.cloud,bcr-de01.inside.bosch.cloud,internal.bosch.cloud,localhost,osh.ipz001.internal.bosch.cloud'
      *   Trying 10.140.249.17...
      * TCP_NODELAY set
      * Connected to osp.ipz001.internal.bosch.cloud (10.140.249.17) port 13000 (#0)
      * ALPN, offering h2
      * ALPN, offering http/1.1
      * successfully set certificate verify locations:
      ~~~
      
      As a Day-2 operation, switching the storage backend to swift works but with this action, all the images that are stored already in the registry gets deleted.
      
      This might be related to https://issues.redhat.com/browse/OCPBUGS-2941 or https://issues.redhat.com/browse/OCPBUGS-2795

      Version-Release number of selected component (if applicable):

      4.10.28

      Additional info:

      We have started discussion with the engineering team on a slack thread and as well as over the email. https://coreos.slack.com/archives/CH98TDJUD/p1668097534123929 

      Attachments

        Issue Links

          Activity

            People

              pprinett@redhat.com Pierre Prinetti
              rhn-support-asadawar Abhijeet Sadawarte
              Jon Uriarte Jon Uriarte
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: