Uploaded image for project: 'Cloud Enablement'
  1. Cloud Enablement
  2. CLOUD-2944

Change request for the default Liveliness/Readiness probe configuration for xPaaS JBoss EAP image jboss-eap-6/eap64-openshift

    Details

    • Type: Enhancement
    • Status: New (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: EAP64 1.8.9.GA
    • Fix Version/s: None
    • Component/s: EAP6

      Description

      xPaaS Liveness Probe doesn't fail when timeout of request is observed as the timeoutSeconds parameter has no effect on the readiness and liveness probes for Container Execution Checks because the timeout argument to this function is ignored by dockerhsim in the call to docker API.

      We have a workaround that can negate this limitation.

      Steps to Reproduce:

      ############################################
      [...]
      livenessProbe:
      exec:
      command:

      • /bin/bash
      • '-c'
      • /opt/eap/bin/livenessProbe.sh
        timeoutSeconds: 1
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3

      ############################################

      The liveliness probe is configured to restart the pod when the script fails to respond 3 times with an interval of 10 seconds. and the timeout is configured at 1 sec, So the pod is expected to restart in 30 seconds after the failure or timeout of probe request.

      1. Deploy eap64-basic-s2i application
      2. oc rsh <eap-pod>
      3. Stop the JBoss server --> sh-4.2$ kill -STOP <PID>
      4. Stop the livliness.sh process --> sh-4.2$ kill -STOP <PID>
      5. Check atomic-openshift-node logs (log-level=4) on the node where pod is deployed.

      Actual results:

      sh-4.2$ time /opt/eap/bin/livenessProbe.sh
      ^C^CTraceback (most recent call last):
      File "/opt/eap/bin/probes/runner.py", line 113, in <module>
      time.sleep(args.sleep)
      KeyboardInterrupt

      real 0m49.311s
      user 0m0.106s
      sys 0m0.037s

      The pod will get stuck in a zombie state and the will not restart.

      ==========================================================================

      The RFE requested for below workaround to be made default in the image

      ############################################
      [...]
      livenessProbe:
      exec:
      command:

      • /bin/bash
      • '-c'
      • timeout 60 /opt/eap/bin/livenessProbe.sh <==Changed
        timeoutSeconds: 1
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3

      ############################################

      With the workaround applied, OpenShift properly detects and records as failed the probes' scripts executions frozen for more than 60 seconds,and it correctly restart the container after 3 failures.

      The timeout value can be reduced to minimize the delay.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                luck3y Ken Wills
                Reporter:
                sanket15 Sanket Nalawade
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: