-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
4.13
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Per the discussion on Slack, the 4.12 to 4.13 upgrade job has been failing frequently. It appears this is due to a node going NotReady at an unexpected time, and in the logs we see failures when calling api-int. This may be caused by a problem with the internal loadbalancer.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Node goes NotReady at an unexpected time
Expected results:
Node only goes NotReady when it reboots for upgrade
Additional info:
This is difficult to debug because the on-prem service logs are not persisted over reboot, and the bogus NotReady state is happening before the node reboots. The first step of fixing this will be to fix the logging so we have usable data from before reboot.
- depends on
-
OPNET-194 Persist service logs over reboot
-
- To Do
-
- links to