Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 1.10.1.GA
Component/s: Qpid Dispatch Router
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
GSS Priority:
Target Release:

1.10.FutureGA
Steps to Reproduce:
Hide

Create a 3-node mesh (interior mode) on OpenShift, making sure the routers are distributed across at least 2 application nodes. The routers in this instance were connecting to an external AMQ broker

Connect consumer applications to the service endpoint for the router mesh, also distributed across the application nodes

Initiate a drain on the application node to evict the pods

Wait until the migrated pod is back up on one of the remaining nodes

You should see the routing errors and unsettled message warnings in the logs of the migrated router.
Show
Create a 3-node mesh (interior mode) on OpenShift, making sure the routers are distributed across at least 2 application nodes. The routers in this instance were connecting to an external AMQ broker Connect consumer applications to the service endpoint for the router mesh, also distributed across the application nodes Initiate a drain on the application node to evict the pods Wait until the migrated pod is back up on one of the remaining nodes You should see the routing errors and unsettled message warnings in the logs of the migrated router.

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

In a 3-router mesh deployment on OpenShift 4.8.35, we observe the following behavior when we drain an application node containing one of the routers.

1. The remaining routers that are not moved continue to function as normal
2. The moved router starts continuously logging "no route to host" warnings in the logs, in the following format:

2022-05-13 19:32:56.598748 +0000 SERVER (info) [C113] Connection to 10.210.46.90:55672 failed: proton:io No route to host - disconnected 10.210.46.90:55672

3. The IP address in these log entries is the former address of the moved router (e.g. as if the router is trying to connect to itself on its old IP address)
4. We can see applications connect to the router, but it appears deliveries remain stuck / unsettled for these connections:

2022-05-13 19:36:07.127228 +0000 ROUTER_CORE (info) [C2][L40] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds

It appears that somewhere the old IP address of the router is not removed and the router is attempting the add a connector to its old IP address. It is unclear whether this is related to the issue with unsettled deliveries or is just another manifestation of the underlying cause.

Other notes: Along with one of the routers, several of the client application pods were also migrated.

Restarting / killing the router seems to resolve the issue - when it comes back, message flow resumes.

Assignee:: Michael Cressman (Inactive)

Reporter:: Duane Hawkins

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/05/13 10:25 PM

Updated:: 2022/07/07 2:45 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates