-
Enhancement
-
Resolution: Done
-
Minor
-
None
Scenario 1:
When two Kieservers pods or more are bootstrapped on multi-KieServer-Pod environment, then there could be a race condition to create config maps by two or more Kieserver pods, the following error could show up in the logs:
19:01:27,997 ERROR [org.kie.server.services.openshift.impl.storage.cloud.KieServerStateOpenShiftRepository] (ServerService Thread Pool -- 76) Processing KieServerState failed.: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/bsig-cloud/configmaps. Message: configmaps "authoring-ha-kieserver" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=configmaps, name=authoring-ha-kieserver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=configmaps "authoring-ha-kieserver" already exists, metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
This error could be safely ignored since the ConfigMap will be created and the pods will work normally.
The proposed change is to add a new catch in this part of the code to handle this exception as a warn that can be safely ignored.
In 7.4.0 a "Known Issue" should be documented alerting users that this error message can be safely ignored explaining that is a simple racing condition to create the configmap used by Kieservers during runtime. The configmap will be created and the pods will work as expected.
Scenario 2:
Intermittently, the Watcher is closed due to random KubernetesClientException, such as this 'too old resource version'.
�[0m�[0m12:20:15,553 INFO [org.kie.server.services.openshift.impl.OpenShiftStartupStrategy] (OkHttp https://172.30.0.1/...) Watcher closed. �[0m�[0m12:20:15,554 INFO [org.kie.server.services.openshift.impl.OpenShiftStartupStrategy] (OkHttp https://172.30.0.1/...) too old resource version: 750726 (779798)
It could be related to known issues from k8s or f8 kube-client. While waiting for the lower level lib to address such issue, from upper level API client perspective, potential options are:
Option 1 (Short Term):
Escalate log message level, gracefully terminate Watcher thread, and recommend a Pod recycle.
Option 2 (Long Term):
Refactor out the Watcher logic from OpenShiftStartupStrategy into a dedicate component with enhanced resiliency, such as being able to restart Watcher should it exits abnormally.
- is incorporated by
-
RHPAM-3333 Kie Server OpenShift startup strategy watcher is closed and DC is not updated
- Closed
- relates to
-
JBPM-8295 Refactor KieServerStateOpenShiftRepository
- Open