-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
4.12.z
-
Important
-
No
-
Rejected
-
False
-
-
-
-
Description of problem:
After performing several reboots in a row, SNO cluster API does not respond anymore: The connection to the server api.cloudransno-site1.slcm1.bos2.lab:6443 was refused - did you specify the right host or port?
Version-Release number of selected component (if applicable):
4.12.16
How reproducible:
We run a test that performs several reboot in a row. We see this issue with a high rate every time we run that test. We say in 4.12.16 100%of times, and now also in 4.12.21 happened the first time we run the test.
Steps to Reproduce:
1. Reboot SNO cluster 5 times 2. Check API
Actual results:
Node does not respond anymore. I left it several hours but it did not come back.
Expected results:
Node recovers properly
Additional info:
System Impact: Very severe. Node cannot be longer used ACM reports: The kube-apiserver is not ok, status code: 0, Get "https://172.31.0.1:443/livez": dial tcp 172.31.0.1:443: connect: connection refused oc adm must gather cannot be performed. Only SOS report. Logs attached