Loading...

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: quay-v3.9.0
Affects Version/s: quay-v3.9.0
Component/s: quay-operator
Labels:
- ReleaseBlocker
- TestBlocker

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Postgres Operator Migrations

Release Blocker:
Approved
Target Version:

quay-v3.9.0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description:

Durning Quay 3.9 Operator upgrade, pod quayregistry-clair-postgres-upgrade init error with

"Multi-Attach error for volume "pvc-38b6a080-2102-4305-8f19-9e77bd1ad75d" Volume is already used by pod(s)"

Index Image:

quay-operator-bundle-container-v3.9.0-137

Index image v4.13: registry-proxy.engineering.redhat.com/rh-osbs/iib:538740

Reproduce Steps:

Actually it is Same test steps with PROJQUAY-5631 refer to "More Update-> 3, A 100% reproduce approach", sometimes, it shows this volume mount error.

1, Provision OCP cluster and Quay with flex job

2, Deploy Quay latest 3.7 Operator on OCP, create a valid quay registry

- kind: postgres  
   managed: true
- kind: clairpostgres
   managed: true

3, Upgrade to latest v3.9 bundle image

4, Whether it pass or fail, then uninstall Quay registry/Quay operator, make sure all pv/pvc released

5, Create a new project, repeat step 2 and step 3. The intention is to utilize the OCP cluster to test a different version Quay upgrade

6, Now Check upgrade status

Actual Result:

[ec2-user@ip-10-0-12-54 ~]$ oc get pod
NAME                        READY  STATUS    RESTARTS    AGE
quay-operator.v3.9.0-645db47fd4-b8hnr       1/1   Running    0        38s
quayregistry-clair-postgres-64d8c4b85-x2c94    1/1   Terminating  1 (4m58s ago)  5m28s
quayregistry-clair-postgres-upgrade-v24lz     0/1   Init:0/1   0        20s

[ec2-user@ip-10-0-12-54 ~]$ oc describe pod/quayregistry-clair-postgres-upgrade-v24lz 
Events:
  Type     Reason                  Age                   From                     Message
  ----     ------                  ----                  ----                     -------
  Normal   Scheduled               15m                   default-scheduler        Successfully assigned quay-enterprise-14080/quayregistry-clair-postgres-upgrade-v24lz to ip-10-0-222-246.us-east-2.compute.internal
  Warning  FailedAttachVolume      15m                   attachdetach-controller  Multi-Attach error for volume "pvc-38b6a080-2102-4305-8f19-9e77bd1ad75d" Volume is already used by pod(s) quayregistry-clair-postgres-64d8c4b85-x2c94
  Normal   SuccessfulAttachVolume  15m                   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-5900ad31-5654-4d94-bc29-c8a521282942"
  Warning  FailedMount             13m                   kubelet                  Unable to attach or mount volumes: unmounted volumes=[postgres-data], unattached volumes=[migration-data kube-api-access-tp7nm clair-postgres-conf-sample postgres-data]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  12m                   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-38b6a080-2102-4305-8f19-9e77bd1ad75d"
  Normal   AddedInterface          12m                   multus                   Add eth0 [10.130.2.10/23] from openshift-sdn
  Normal   Pulled                  7m41s (x5 over 12m)   kubelet                  Container image "registry.redhat.io/rhscl/postgresql-12-rhel7@sha256:68b969143f4c638098a13c2db8693b60339122ef004f28875ebc4a4afc23035d" already present on machine
  Normal   Created                 7m41s (x5 over 12m)   kubelet                  Created container postgres-old
  Normal   Started                 7m41s (x5 over 12m)   kubelet                  Started container postgres-old
  Warning  BackOff                 3m11s (x20 over 11m)  kubelet                  Back-off restarting failed container postgres-old in pod quayregistry-clair-postgres-upgrade-v24lz_quay-enterprise-14080(61aa21b2-fe72-4ed9-a47b-aadedc638eac)
[ec2-user@ip-10-0-12-54 ~]$

It shows resource conflict durning pod quayregistry-clair-postgres-64d8c4b85-x2c94 terminate, but after this pod been terminated, this error still there, pls review full log attached.