-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.14
-
Moderate
-
No
-
CLOUD Sprint 249, CLOUD Sprint 250, CLOUD Sprint 251, CLOUD Sprint 252, CLOUD Sprint 253, CLOUD Sprint 254, CLOUD Sprint 255, CLOUD Sprint 256, CLOUD Sprint 257, CLOUD Sprint 258, CLOUD Sprint 259, CLOUD Sprint 260, CLOUD Sprint 261, CLOUD Sprint 263, CLOUD Sprint 264, CLOUD Sprint 262
-
16
-
Rejected
-
False
-
-
Description of problem:
Machine API does not finish reconciling new machine after a timeout occurs
Version-Release number of selected component (if applicable):
4.14
How reproducible:
The timing to force the issue is difficult, but may be able to force issue with unit test injection.
Steps to Reproduce:
1. Scale machine higher than current available 2. Have kube-apiserver / etcd timeout during period when mapi attempts to update machine info when transitioning to provisioned.
Actual results:
After waiting 40+ minutes from time issue occurs, the machine never moves to provisioned even though machine vm is created.
Expected results:
Machine moves to provisioned state after cloning is completed.
Additional info:
In most cases I would agree infrastructure should be better to prevent this scenario from happening; however, CI infrastructure is going to be high at times and if we cannot recover from timeouts when attempting to progress to Provisioned, we'll have many unneeded CI failures. This issue is not marked as high severity, but it would be great if we can improve the vsphere machine provisioning process to be able to recover from this scenario and eventually mark the machine as provisioned so the CI tests can complete.