-
Bug
-
Resolution: Done
-
Undefined
-
4.10.z
-
None
-
CNF RAN Sprint 231, CNF RAN Sprint 232
-
2
-
False
-
-
-
Rel Note for Telco: Not Required (4.12) - this is the 4.10.z fix
Description of problem:
When PTP events are configured and pods are restarted (for example, during an SNO reboot), there is a race condition between the cloud-events-proxy container in the linuxptp-daemon pod and the AMQ router pod. If the cloud-events-proxy container is started first, it will fail to connect to the AMQ router and avoid sending any events at all. The only workaround would be to restart the container or linuxptp-daemon pod. We see the following in the logs: time="2022-11-02T11:49:44Z" level=info msg="Starting AMQP server" time="2022-11-02T11:49:59Z" level=error msg="error starting amqp at amqp://router-mesh.amq-router.svc.cluster.local error: amqp connection error amqp connection error" time="2022-11-02T11:49:59Z" level=warning msg="requires QPID router installed to function fully amqp connection error amqp connection error" And then, anytime an event should be sent: time="2022-11-02T11:49:59Z" level=warning msg="amqp disabled,no action taken: ...
Version-Release number of selected component (if applicable):
Seen at least on OCP 4.9.37 and 4.11.5
How reproducible:
Very often
Steps to Reproduce:
1. Set up an SNO with PTP and AMQ Interconnect operators 2. Configure PTP to enable the event publisher, using an AMQP router in the same node 3. Reboot
Actual results:
PTP Events fails to connect to the AMQ router and never retries
Expected results:
It should retry the connection and connect to the AMQ router once it is available
Additional info:
- depends on
-
OCPBUGS-4754 Race condition between PTP events and AMQ router startup
- Closed
- links to