Uploaded image for project: 'RH-SSO'
  1. RH-SSO
  2. RHSSO-2364

After the RH SSO Operator upgrade to 7.6.2 ('rhsso-operator.7.6.2-opr-001'), the Liveness and Readiness Probes are failing in FIPS (disabled) environments

    XMLWordPrintable

Details

    • False
    • None
    • False
    • Hide

      The only "workaround" is to temporarily set the RH SSO Operator to unmanaged, remove the Probes and ask the customers to closely monitor their environment in the meantime (since the issue is with the Probes only and the RH SSO image itself works just fine). Obviously this is a very problematic approach but it's currently is the only single known solution to restore a running Production environment:

      1. Patch 'Keycloak' to have the pods 'unmanaged' by the RH SSO Operator:

      $ oc patch Keycloak <KEYCLOAK> --type merge -p '{"spec":{"unmanaged":true}}'
      

      2. Edit StatefulSet/keycloak:

      $ oc edit StatefulSet keycloak
      

      3. Remove the Liveness and Readiness Probes through deleting the lines below:

      ...
      livenessProbe:
        exec:
          command:
          - /bin/sh
          - -c
          - /probes/liveness_probe.sh
        failureThreshold: 10
        initialDelaySeconds: 30
        periodSeconds: 30
        successThreshold: 1
        timeoutSeconds: 22
      ...
      readinessProbe:
        exec:
          command:
          - /bin/sh
          - -c
          - /probes/readiness_probe.sh
        failureThreshold: 10
        initialDelaySeconds: 40
        periodSeconds: 30
        successThreshold: 1
        timeoutSeconds: 22
      ...
      

      4. Save and quit using the default vi commands :wq or :x.

      5. Delete the keycloak-0 pod in order to force a restart from it:

      $ oc delete pod keycloak-0
      

      6. Customers will need to closely monitor their environment due to the unmanaged status and the Probes removed, until this issue is addressed. As mentioned, this is a last resort workaround to restore their environments.

      Show
      The only "workaround" is to temporarily set the RH SSO Operator to unmanaged , remove the Probes and ask the customers to closely monitor their environment in the meantime (since the issue is with the Probes only and the RH SSO image itself works just fine). Obviously this is a very problematic approach but it's currently is the only single known solution to restore a running Production environment: 1. Patch 'Keycloak' to have the pods 'unmanaged' by the RH SSO Operator: $ oc patch Keycloak <KEYCLOAK> --type merge -p '{ "spec" :{ "unmanaged" : true }}' 2. Edit StatefulSet/keycloak : $ oc edit StatefulSet keycloak 3. Remove the Liveness and Readiness Probes through deleting the lines below: ... livenessProbe: exec: command: - /bin/sh - -c - /probes/liveness_probe.sh failureThreshold: 10 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 22 ... readinessProbe: exec: command: - /bin/sh - -c - /probes/readiness_probe.sh failureThreshold: 10 initialDelaySeconds: 40 periodSeconds: 30 successThreshold: 1 timeoutSeconds: 22 ... 4. Save and quit using the default vi commands :wq or :x . 5. Delete the k eycloak-0 pod in order to force a restart from it: $ oc delete pod keycloak-0 6. Customers will need to closely monitor their environment due to the unmanaged status and the Probes removed, until this issue is addressed. As mentioned, this is a last resort workaround to restore their environments.
    • Hide

      Either perform a manual RH SSO Operator upgrade from rhsso-operator.7.6.1-opr-005 to rhsso-operator.7.6.2-opr-001 or wait for an Automatic upgrade depending on the installPlanApproval policy from the RH SSO Operator Subscription object.

      Show
      Either perform a manual RH SSO Operator upgrade from rhsso-operator.7.6.1-opr-005 to rhsso-operator.7.6.2-opr-001 or wait for an Automatic upgrade depending on the installPlanApproval  policy from the RH SSO Operator Subscription object.

    Description

      This issue affects specifically customers that have FIPS enabled in their OpenShift Cluster but disabled it for the RH SSO Operator through an Environment Variable as follows:

      - apiVersion: keycloak.org/v1alpha1
        kind: Keycloak
        ...
        spec:
          keycloakDeploymentSpec:
            experimental:
              env:
              - name: JAVA_TOOL_OPTIONS
                value: -Dcom.redhat.fips=false
          ... 

      The Liveness and Readiness Probes (which were working normally in the latest RH SSO 7.6.1 Operator release - rhsso-operator.7.6.0-opr-001) are now failing as below:

        message: |
          Liveness probe failed: {
              "probe.eap.dmr.EapProbe": "Error sending probe request: [digital envelope routines: EVP_DigestInit_ex] disabled for FIPS",
              "probe.eap.dmr.HealthCheckProbe": "Error sending probe request: [digital envelope routines: EVP_DigestInit_ex] disabled for FIPS"
          }
          INFO Using the 'ejRKSfxZsUFwrAiqhvfSTPvUzjxfwOvx' username to authenticate the probe request against the JBoss DMR API.
          INFO Using the 'ejRKSfxZsUFwrAiqhvfSTPvUzjxfwOvx' username to authenticate the probe request against the JBoss DMR API.
      
      ...
      
        message: |
          (combined from similar events): Readiness probe failed: {
              "probe.eap.dmr.EapProbe": "Error sending probe request: [digital envelope routines: EVP_DigestInit_ex] disabled for FIPS",
              "probe.eap.dmr.HealthCheckProbe": "Error sending probe request: [digital envelope routines: EVP_DigestInit_ex] disabled for FIPS"
          }
          INFO Using the 'ejRKSfxZsUFwrAiqhvfSTPvUzjxfwOvx' username to authenticate the probe request against the JBoss DMR API.
          INFO Using the 'ejRKSfxZsUFwrAiqhvfSTPvUzjxfwOvx' username to authenticate the probe request against the JBoss DMR API.
      

      Full details will be attached in the events.yaml file from the customer.

      This issue has been reported by at least 2 customers, both with FIPS enabled environments (and disabling for the RH SSO Operator as mentioned above).

      It's important to highlight that it only affects the Liveness and Readiness Probes and the RH SSO Operator is still able to start and run normally the OpenShift image with the Probes disabled (more information at the Workaround Section)

      While we don't officially support RH SSO in FIPS, customers were able to use the RH SSO Operator (and also the Template / JDBC Base image) normally with JAVA_TOOL_OPTIONS=-Dcom.redhat.fips=false (Example Jira where we assisted customers on deploying RH SSO in their OpenShift enabled FIPS environments: SSOSUP-162) and as mentioned above this issue is limited to the Liveness and Readiness Probes and not the OpenShift image itself.

      It's expected that other customers that applied the same workaround to have the RH SSO Operator working on their FIPS environment might also experience the same issue.

      In addition to the Events .yaml file I will also attach other details from both Cases that will be linked to this Bug.

      NOTE: I don't have an OpenShift FIPS enabled environment to reproduce this issue, however the "Workaround" has been tested and confirmed to work by at least one of the customers.

      Attachments

        1. 03451467-inspect.local.zip
          656 kB
        2. 03451467-keycloak.yaml
          2 kB
        3. 03451467-misc.txt
          6 kB
        4. events.yaml
          459 kB

        Activity

          People

            rhn-engineering-shawkins Steven Hawkins
            rhn-support-ekonecsn Estevao Konecsni
            Votes:
            4 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: