-
Bug
-
Resolution: Done
-
Major
-
rhwa-25.1, rhwa-24.3, rhwa-23.3
-
None
-
False
-
-
True
-
-
Bug Fix
-
Done
-
-
-
Critical
Hello Michael,
as suggested i'm opening this one.
Our customer is not getting the node rebooted when they shutdown the network in a node with NHC and SNR in place.
Basically these are the steps:
- they shutdown the interface in the node
- remediation seems to take place, the node is drained but NOT rebooted
- once the connectivity is restored the node is rebooted
What you've found during the troubleshooting analysing the snr logs, is that the node is somehow still communicating with peers, and so is not marked as isolated and rebooted.
The customer tried to disable completely the networking in the node, and the behaviour is the same.
I'm attaching in the Jira the last snr agent log.
In that one interface shutdown has been done on
Wed Apr 23 08:32:15 UTC 2025
Customer noticed these lines
INFO api-check getting health status from peer {"IP": ""} INFO api-check.peerhealth client new peer client {"serveraddr": ":30001"} ... INFO api-check got response from peer {"IP": "", "status": 3}
and he's asking if this communication attempt could be the cause of the node not marked unhealthy.
Let me know if you need anything else from the customer end.
We already have and shared with you a consistent amount of data.