Customer is migrating their JDG cluster to OpenShift and faced an issue while adding their own cache-container to the clustered-openshift.xml file.
The problem is, the Python health check is heart beating the default clustered cache container, this way the pod is never healthy, leading to errors in their events:
"Unhealthy | Readiness probe failed: { "probe.eap.dmr.EapProbe": { "probe.eap.dmr.ServerStatusTest": "running", "probe.eap.dmr.DeploymentTest": "No deployments",… "
/opt/datagrid/bin/readinessProbe.sh | livenessProbe.sh ----> /opt/datagrid/bin/probes/probe/jdg/jolokia.py
class CacheStatusTest(Test): """ Checks the cache statuses. """ def __init__(self): super(CacheStatusTest, self).__init__( { "type": "read", "attribute": "cacheStatus", "mbean": "jboss.datagrid-infinispan:type=Cache,name=*,manager=\"clustered\",component=Cache" } )
So the suggestion here is to add a new deployment environment variable to specify the cache-container name. Also, the customer is trying to add more than one cache-container}}s. In scenarios like this the health check supposed to ping each {{cache-container.
To reproduce this issue, just start a new app on OCP with a cache-container with different name:
oc new-app --name=datagrid-app --template=datagrid71-basic \ -p APPLICATION_NAME=datagrid-app oc new-build --binary=true \ --image-stream=jboss-datagrid71-openshift:1.3 \ --name=datagrid-app -l app=datagrid-app oc start-build datagrid-app --from-dir=. --follow oc set triggers dc/datagrid-app --from-image=jboss-datagrid71-openshift:1.3 --remove oc set triggers dc/datagrid-app --from-image=datagrid-app:latest -c datagrid-app
The xml:
<cache-container name="bananas" default-cache="my-super-cache" statistics="true"> (...) </cache-container>
If the image is supposed to work like that (no updates on cache-container name), a documentation note regarding this behavior could be add at the docs. Just let me know and I open a JIRA for them.
Many thanks!