Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-2928

[Operator] Broker Operator unable to recover from CR changes causing erroneous state (nonexistent image specified)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • AMQ 7.5.0.GA
    • operator
    • None
    • Hide
      If the AMQ Broker Operator encounters an error when applying a Custom Resource (CR) update, the Operator does not recover. Specifically, the Operator stops responding as expected to further updates to your CRs.

      For example, say that a misspelling in the value of the image attribute in your main broker CR causes broker Pods to fail to deploy, with an associated error message of `ImagePullBackOff`. If you then fix the misspelling and apply the CR changes, the Operator does not deploy the specified number of broker Pods. The Operator does not respond to any further CR changes.

      To work around this issue, you must delete any pods with the status `Pending` and let the Operator recreate them.
      To check which pods have the status `Pending`, use a command such as `kubectl get pods --field-selector=status.phase=Pending`.
      Show
      If the AMQ Broker Operator encounters an error when applying a Custom Resource (CR) update, the Operator does not recover. Specifically, the Operator stops responding as expected to further updates to your CRs. For example, say that a misspelling in the value of the image attribute in your main broker CR causes broker Pods to fail to deploy, with an associated error message of `ImagePullBackOff`. If you then fix the misspelling and apply the CR changes, the Operator does not deploy the specified number of broker Pods. The Operator does not respond to any further CR changes. To work around this issue, you must delete any pods with the status `Pending` and let the Operator recreate them. To check which pods have the status `Pending`, use a command such as `kubectl get pods --field-selector=status.phase=Pending`.
    • Documented as Known Issue
    • Hide

      Delete custom resource of broker and put it back up again.

      Show
      Delete custom resource of broker and put it back up again.
    • Hide
      • Deploy the operator and CRDs as per documentation.
      • Create valid broker deployment with correctly filled values in custom resource.
      • Make sure the pod(s) is (are) running.
      • Change the image in custom resource to different valid version and update CR instance.
      • Make sure the pod(s) is (are) running.
      • Modify CR to cause erroneous state - make typo in image path or other - and apply changes.
      • Check that pod(s) couldn't be deployed.
      • Fix the CR and apply changes.
      • Check operator logs (operator sholud be avere of changes in CR).
      • Check the stateful set - Nothing has happen and it's hanging in the erroneous state.
      Show
      Deploy the operator and CRDs as per documentation. Create valid broker deployment with correctly filled values in custom resource. Make sure the pod(s) is (are) running. Change the image in custom resource to different valid version and update CR instance. Make sure the pod(s) is (are) running. Modify CR to cause erroneous state - make typo in image path or other - and apply changes. Check that pod(s) couldn't be deployed. Fix the CR and apply changes. Check operator logs (operator sholud be avere of changes in CR). Check the stateful set - Nothing has happen and it's hanging in the erroneous state.

      There is inconsistency in behavior of operator when it gets to erroneous state like "image pull back off" (which is the only case tested for now). When 'oc apply' command is used to update custom resource on "healthy setup" changes like update of image are performed immediately without need for scaledown or anything else. However if you get into erroneous state (image pull back off due to let's say typo or what ever) operator is not able to recover the correct state. It doesn't try to update the stateful set anymore and just hangs in current state.

            gtully@redhat.com Gary Tully
            jcliffor@redhat.com John Clifford
            Mikhail Krutov Mikhail Krutov
            Mikhail Krutov
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: