Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-2942

Operator scaleup to n Pods: Pod #0 tries to contact non-existent Pod #n+1

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Critical Critical
    • None
    • AMQ 7.5.0.GA, 7.5.0.CR3
    • clustering
    • None
    • 3
    • Release Notes
    • Hide
      If you change the `size` attribute of your Custom Resource (CR) instance to scale down a broker deployment, the first broker Pod in the cluster can make repeated attempts to connect to the drainer Pods that started up to migrate messages from the brokers that shut down, before they shut down themselves.

      To work around this issue, follow these steps:

      . Scale your deployment to a single broker Pod.
      . Wait for all drainer Pods to start, complete message migration, and then shut down.
      . If the single remaining broker Pod has log entries for an “unknown host exception”, scale the deployment down to zero broker Pods, and then back to one.
      . When you have verified that the single remaining broker Pod is not recording exception-based log entries, scale your deployment back to its original size.
      Show
      If you change the `size` attribute of your Custom Resource (CR) instance to scale down a broker deployment, the first broker Pod in the cluster can make repeated attempts to connect to the drainer Pods that started up to migrate messages from the brokers that shut down, before they shut down themselves. To work around this issue, follow these steps: . Scale your deployment to a single broker Pod. . Wait for all drainer Pods to start, complete message migration, and then shut down. . If the single remaining broker Pod has log entries for an “unknown host exception”, scale the deployment down to zero broker Pods, and then back to one. . When you have verified that the single remaining broker Pod is not recording exception-based log entries, scale your deployment back to its original size.
    • Documented as Known Issue
    • Workaround Exists
    • Hide

      See this comment below.

      Show
      See this comment below.
    • Hide
      1. Created an OCP4.1 cluster on AWS; deployed broker through Broker Operator
      2. Initially, there are 2 pods of AMQ Broker after deployment.
      3. Scale up to 16 pods was done using
        $oc apply -f <config with new size>
        
      4. after scale up process was complete, OCP cluster has 16pods.
      5. 15 (pods #1-#15) of those do not report any kind of problems, but logs of pod#0 indicate that it tries to connect to pod#16, which doesn't exist and never existed:

      http://pastebin.test.redhat.com/801376 (it only contains UnknownHostException's)

      Show
      Created an OCP4.1 cluster on AWS; deployed broker through Broker Operator Initially, there are 2 pods of AMQ Broker after deployment. Scale up to 16 pods was done using $oc apply -f <config with new size> after scale up process was complete, OCP cluster has 16pods. 15 (pods #1-#15) of those do not report any kind of problems, but logs of pod#0 indicate that it tries to connect to pod#16, which doesn't exist and never existed: http://pastebin.test.redhat.com/801376 (it only contains UnknownHostException's)

      Wrong configuration of broker instances deployed using broker operator is applied to some of the pods running.

              rhn-support-rkieley Roderick Kieley
              mkrutov Mikhail Krutov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: