Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-4278

Robot account resilience: Maintaining access during LDAP IdP outages

XMLWordPrintable

    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None

      Here is the details of the issue described by the customer:
      We have a Quay 3.6 deployed on OCP 4.8 using the Quay Operator 3.6. Quay is configured to use an LDAP (IDM) as the identity provider [1].

      We recently discovered that when there is any LDAP connectivity issue (LDAP is not available, LDAP user pass incorrect, etc) and Quay is restarted, it does not start up, Pods remain in CrashLoopBackOff.
      ```
      $ oc get pods
      NAME READY STATUS RESTARTS AGE
      manocluster-registry-clair-app-d95c97f6f-8v5kr 1/1 Running 1 2d5h
      manocluster-registry-clair-app-d95c97f6f-f6b86 1/1 Running 0 2d5h
      manocluster-registry-clair-app-d95c97f6f-lqqxd 1/1 Running 0 14h
      manocluster-registry-clair-app-d95c97f6f-tqg8g 1/1 Running 0 14h
      manocluster-registry-clair-postgres-79d7845757-jkpck 1/1 Running 0 2d5h
      manocluster-registry-quay-app-f85679cfb-7682k 0/1 CrashLoopBackOff 633 2d5h
      manocluster-registry-quay-app-f85679cfb-l2lb9 0/1 CrashLoopBackOff 634 2d5h
      manocluster-registry-quay-app-upgrade-zkjsk 0/1 Completed 0 2d5h
      manocluster-registry-quay-config-editor-b844544fd-pc4vr 1/1 Running 0 2d5h
      manocluster-registry-quay-database-94b958748-dr2lg 1/1 Running 0 2d5h
      manocluster-registry-quay-mirror-7d9f9c684-dzjc2 0/1 Init:CrashLoopBackOff 445 2d5h
      manocluster-registry-quay-mirror-7d9f9c684-sl78d 0/1 Init:CrashLoopBackOff 445 2d5h
      manocluster-registry-quay-postgres-init-npnnc 0/1 Completed 0 2d5h
      manocluster-registry-quay-redis-5b778c56c6-d8s8b 1/1 Running 0 2d5h
      quay-operator.v3.6.8-5b96bd5c88-kc9sj 1/1 Running 0 2d5h
      ```

      After checking quay-app Pods, we could see the problem was related with LDAP, as explained above;
      ```
      ---------------------{}{}----------------------------------------------------------------------------------{}{}++{}{}-----

      LDAP Could not authenticate LDAP server. Error: LDAP Result Code 49 "Invalid Credentials": 🔴

      ---------------------{}{}----------------------------------------------------------------------------------{}{}++{}{}-----
      ```

      We understand that if an LDAP is configured as the Quay Identity Provider, it is key to guarantee its availability. We also know that Quay does not support local users when an LDAP is configured, hence it would make sense for Quay to remain unavailable....

      But the main question here are robot accounts. Robot-acounts should still work when the LDAP is unavailable, meaning a user could still pull and push images to Quay with a robot-account, despite LDAP being unavailable. Also the unauthenticated pulls from public repos.

      In summary, unless unauthenticated pulls & robot-accounts are somehow linked to LDAP, Quay should remain up & running even when the LDAP is not available, so pull & push operations hould remain available for the users. The current behaviour is very disruptive

      1] https://access.redhat.com/documentation/en-us/red_hat_quay/3.6/html-single/configure_red_hat_quay/index#config-fields-ldap


       

      The intended scope for this RFE:

      The RFE will target a degraded runtime mode where, following a successful startup, a subsequent LDAP failure will:

      • Block all interactive user logins/admin functions.
      • Maintain full availability for Robot Account operations and unauthenticated pulls.

       

      This is aimed at ensuring security at launch while maximizing availability for the automation workflows.

              rhn-coreos-tunwu Tony Wu
              rhn-support-mjahangi Muhammad Selim Jahangir
              None
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                None
                None