Uploaded image for project: 'jBPM'
  1. jBPM
  2. JBPM-8588

Gracefully handle the generic KubernetesClientException on OpenShiftStartUpStrategy

    Details

    • Type: Enhancement
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Cloud, KieServer
    • Labels:
    • Affects:
      User Experience
    • Docs QE Status:
      NEW
    • QE Status:
      NEW

      Description

      Scenario 1:
      When two Kieservers pods or more are bootstrapped on multi-KieServer-Pod environment, then there could be a race condition to create config maps by two or more Kieserver pods, the following error could show up in the logs:

      19:01:27,997 ERROR [org.kie.server.services.openshift.impl.storage.cloud.KieServerStateOpenShiftRepository] (ServerService Thread Pool -- 76) Processing KieServerState failed.: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/bsig-cloud/configmaps. Message: configmaps "authoring-ha-kieserver" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=configmaps, name=authoring-ha-kieserver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=configmaps "authoring-ha-kieserver" already exists, metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
      

      This error could be safely ignored since the ConfigMap will be created and the pods will work normally.

      The proposed change is to add a new catch in this part of the code to handle this exception as a warn that can be safely ignored.

      In 7.4.0 a "Known Issue" should be documented alerting users that this error message can be safely ignored explaining that is a simple racing condition to create the configmap used by Kieservers during runtime. The configmap will be created and the pods will work as expected.

      Scenario 2:
      Intermittently, the Watcher is closed due to random KubernetesClientException, such as this 'too old resource version'.

      12:20:15,553 INFO  [org.kie.server.services.openshift.impl.OpenShiftStartupStrategy] (OkHttp https://172.30.0.1/...) Watcher closed.
      12:20:15,554 INFO  [org.kie.server.services.openshift.impl.OpenShiftStartupStrategy] (OkHttp https://172.30.0.1/...) too old resource version: 750726 (779798)
      

      It could be related to known issues from k8s or f8 kube-client. While waiting for the lower level lib to address such issue, from upper level API client perspective, potential options are:
      Option 1 (Short Term):
      Escalate log message level, gracefully terminate Watcher thread, and recommend a Pod recycle.

      Option 2 (Long Term):
      Refactor out the Watcher logic from OpenShiftStartupStrategy into a dedicate component with enhanced resiliency, such as being able to restart Watcher should it exits abnormally.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  rhtevan Evan Zhang
                  Reporter:
                  zanini Ricardo Zanini Fernandes
                  Tester:
                  Jakub Schwan
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated: