Here is the details of the issue described by the customer:
We have a Quay 3.6 deployed on OCP 4.8 using the Quay Operator 3.6. Quay is configured to use an LDAP (IDM) as the identity provider [1].
We recently discovered that when there is any LDAP connectivity issue (LDAP is not available, LDAP user pass incorrect, etc) and Quay is restarted, it does not start up, Pods remain in CrashLoopBackOff.
```
$ oc get pods
NAME READY STATUS RESTARTS AGE
manocluster-registry-clair-app-d95c97f6f-8v5kr 1/1 Running 1 2d5h
manocluster-registry-clair-app-d95c97f6f-f6b86 1/1 Running 0 2d5h
manocluster-registry-clair-app-d95c97f6f-lqqxd 1/1 Running 0 14h
manocluster-registry-clair-app-d95c97f6f-tqg8g 1/1 Running 0 14h
manocluster-registry-clair-postgres-79d7845757-jkpck 1/1 Running 0 2d5h
manocluster-registry-quay-app-f85679cfb-7682k 0/1 CrashLoopBackOff 633 2d5h
manocluster-registry-quay-app-f85679cfb-l2lb9 0/1 CrashLoopBackOff 634 2d5h
manocluster-registry-quay-app-upgrade-zkjsk 0/1 Completed 0 2d5h
manocluster-registry-quay-config-editor-b844544fd-pc4vr 1/1 Running 0 2d5h
manocluster-registry-quay-database-94b958748-dr2lg 1/1 Running 0 2d5h
manocluster-registry-quay-mirror-7d9f9c684-dzjc2 0/1 Init:CrashLoopBackOff 445 2d5h
manocluster-registry-quay-mirror-7d9f9c684-sl78d 0/1 Init:CrashLoopBackOff 445 2d5h
manocluster-registry-quay-postgres-init-npnnc 0/1 Completed 0 2d5h
manocluster-registry-quay-redis-5b778c56c6-d8s8b 1/1 Running 0 2d5h
quay-operator.v3.6.8-5b96bd5c88-kc9sj 1/1 Running 0 2d5h
```
After checking quay-app Pods, we could see the problem was related with LDAP, as explained above;
```
----------------------{}------------------------------------------------------------------------------------{}++{}------
LDAP | Could not authenticate LDAP server. Error: LDAP Result Code 49 "Invalid Credentials": | 🔴 |
----------------------{}------------------------------------------------------------------------------------{}++{}------
```
We understand that if an LDAP is configured as the Quay Identity Provider, it is key to guarantee its availability. We also know that Quay does not support local users when an LDAP is configured, hence it would make sense for Quay to remain unavailable....
But the main question here are robot accounts. Robot-acounts should still work when the LDAP is unavailable, meaning a user could still pull and push images to Quay with a robot-account, despite LDAP being unavailable. Also the unauthenticated pulls from public repos.
In summary, unless unauthenticated pulls & robot-accounts are somehow linked to LDAP, Quay should remain up & running even when the LDAP is not available, so pull & push operations hould remain available for the users. The current behaviour is very disruptive
- is incorporated by
-
PROJQUAY-7057 Make Quay more robust in case of transient LDAP failures
- New
- links to