Uploaded image for project: 'Red Hat Process Automation Manager'
  1. Red Hat Process Automation Manager
  2. RHPAM-4393

Pods stuck in create loop when are used generated passwords

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Not a Bug
    • Icon: Minor Minor
    • 7.13.1.GA
    • 7.13.0.GA
    • Cloud
    • None
    • OCP 4.10
      BA Operator 7.13.0 CR1

      During testing of RHPAM-4140 I run into issues when Deployments configuration and pods are re-created in a loop. From the operator log it looks like there are changed values for default passwords.

      Following spec were deployed in KieApp
      with new secretAdminCredentials property

      spec: 
        commonConfig: 
          secretAdminCredentials: rhpam-credentials
        environment: rhpam-authoring
        useImageTags: true
      
      

      and with old properties

      spec: 
        commonConfig: 
          adminPassword: somepwd
          adminUser: someuser
        environment: rhpam-authoring
        useImageTags: true
      

      Also there was missing part with status (RHPAM-4394), so from the KieApp is not clear what was deployed. User needs to check it in Deployment config or in pod (both can be problem to get, as they are recreating).

      In attached operator log business-automation-operator-78d5c69567-7brft-business-automation-operator.log you can see failing objects checks. You can also check diffs of Kie Server deployment config yamls in kieserverDCs.zip , download from project during dc and pod were recreating.


      Update after deeper investigation of root cause.
      Engineering team was not able to reproduce this issue in their testing environment.

      During the investigation in QE OCP instance I was able to detect the root cause of this loop failure. Root cause of this issue is malformed environment in namespace. As part of the test automation, from the namespace, was delete the ClusterRoleBinding (CRB) file. When the file was removed, all newly deployed KieApps are failing to deploy, Operator is trying to recreate them and this leads to the loop when new DeploymentConfigs are created. Operator should be aware of CRB file delete and be able to re-create it.
      Same behaving where spotted when was BA Operator deployed via OLM.

      The priority of this issue can be lowered. As user needs to manually delete the CRB file, that is normally automatically create on BA Operator deploy, it is not that easy to run into this issue. However the BA Operator should be able to recover from this failing state.

      Actual behaviour: When CRB file is delete, the file is not re-created and new KieApps failing to deploy
      Expected behaviour: When CRB file is delete, the file is re-created by BA Operator and KieApps are deployed successfully.

      Part of log output after CRB file is removed

      {
          "level": "error",
          "ts": 1655817243.0291653,
          "logger": "controller.kieapp-controller",
          "msg": "Reconciler error",
          "name": "fail",
          "namespace": "aaa-test",
          "error": "consolelinks.console.openshift.io is forbidden: User \"system:serviceaccount:aaa-test:business-automation-operator\" cannot create resource \"consolelinks\" in API group \"console.openshift.io\" at the cluster scope",
          "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/tmp/scripts/builder/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/tmp/scripts/builder/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"
      }
      

            mdessi-1 Massimiliano Dessi
            jakubschwan Jakub Schwan
            Jakub Schwan Jakub Schwan
            Jakub Schwan Jakub Schwan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: