Uploaded image for project: 'Red Hat Data Grid'
  1. Red Hat Data Grid
  2. JDG-2461

LivenessProbe can incorrectly fail due to CacheManagerTest probe

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • JDG 7.3.1 ER4
    • JDG 7.3 ER1
    • Openshift Images
    • None
    • JDG Sprint #26, JDG Sprint #27

      The probe currently used in our images to check that all caches are available is not correct, as createdCacheCount is incremented before a cache is running. Therefore, it is normal to have definedCacheCount = createdCacheCount > runningCacheCount, resulting in the livenessprobe failing incorrectly.

      One option would be to remove the CacheManagerTest, however we still need a check for the readiness probe. This is because we still need a way to check if all pre-defined caches have started, either correctly or with an error on startup. If a user has defined more than one cache, then we don't want the pod to be classed as ready too soon as the user may have written some logic around only performing requests if a pod is available.

      We should replace the CacheManagerTest with a TransportAvailable test, which ensures that the NettyTransport mbean exists. This will ensure that all defined caches have been started, as the transport does not start until all pre-defined caches have been started.

              remerson@redhat.com Ryan Emerson
              remerson@redhat.com Ryan Emerson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: