Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-12952

MP Health returns UP when checks are expected but not installed yet.

    XMLWordPrintable

Details

    • Hide

      Get WildFly
      run a reproducer

      git clone git@github.com:istraka/eap-microprofile-test-suite.git -b  mp-health-feature 
      cd eap-microprofile-test-suite
      mvn clean install -DskipTests -Djboss.dist=FOO
      mvn surefire:test@manual-test -pl microprofile-health/ -Dtest=NotReadyHealthCheckTest  -Djboss.home=_path_to_wf_
      
      Show
      Get WildFly run a reproducer git clone git@github.com:istraka/eap-microprofile-test-suite.git -b mp-health-feature cd eap-microprofile-test-suite mvn clean install -DskipTests -Djboss.dist=FOO mvn surefire:test@manual-test -pl microprofile-health/ -Dtest=NotReadyHealthCheckTest -Djboss.home=_path_to_wf_

    Description

      MicroProfile Health 2.0 specification link says:

      • A producer MUST support custom, application level health check procedures
      • A producer SHOULD support reasonable out-of-the-box procedures
      • A producer with no health check procedures expected or installed MUST return positive overall status (i.e. HTTP 200)
      • A producer with health check procedures expected but not yet installed MUST return negative overall status (i.e. HTTP 503)

      When I deploy and application with a readiness probe before WildFly is started, from my and namely OpenShift POV the health check procedure is expected from the very beginning.
      Let me note that on OpenShift starting the served should mean starting the service.
      Hence I expect negative overall status till the probe is ready and is able to provide response.

      However WildFly with default setting responses with status UP:

      while true; do echo $(date +"%T.%3N") ;  curl   localhost:9990/health/ready; echo ""; done
      
      17:17:56.438 curl: (7) Failed to connect to localhost port 9990: Connection refused
      17:17:56.452 curl: (7) Failed to connect to localhost port 9990: Connection refused
      17:17:56.466 {"status":"UP","checks":[]}
      ...
      17:18:01.121 {"status":"UP","checks":[]}
      17:18:01.133 {"status":"DOWN","checks":[{"name":"delayed-readiness","status":"DOWN"}]}
      

      This violates (4) bullet in the specification.

      WildFly provides option to set global Status when probes are not defined (documentation)

      Which would mean the scenario would behave according to the specification. Yet the default state violates it.

      If the default value were DOWN we would run into an issue if WildFly without deployment were used (for example as backup for AMQ). The status would be just DOWN:

      17:22:41.719
      {"status":"DOWN","checks":[]}
      

      And that would violate (3) in the specification.
      TCK tests do not cover the scenario well.

      Is there a way to return DOWN until WildFly scan a deployment and if no health check is found (thus not expected) then start to return UP ? The scan should happen during MP Health initialization.
      If there is no deployment, the UP it is.

      Setting the priority to blocker since WildFly 19 shall be EAP 7.3.0.CD19 which is supposed to run on OpenShift. With this behavior health check is not very useful because:

      • OpenShift starts the service wait some time and start asking for health status
      • WildFly responses UP yet application helathcheck is not installed yet
      • With health status UP OpenShift consider a pod ready
      • In this point application is installed, health status is DOWN (a DB is down) however OpenShift flow is somewhere else

      Attachments

        Issue Links

          Activity

            People

              jmesnil1@redhat.com Jeff Mesnil
              istraka@redhat.com Ivan Straka
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: