-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.10
-
None
-
Important
-
None
-
CLOUD Sprint 243, CLOUD Sprint 244, CLOUD Sprint 245
-
3
-
Rejected
-
False
-
-
If Release Note Needed, Set a Value
-
Set a Value
This is a clone of the GCP bug OCPBUGS-2117, this problem also affects Azure. The description below contains GCP specific information but the same general problem exists in the termination handler for Azure as well.
Description of problem:
GCP preemptible VM termination is not being handled correctly by machine-api-termination-handler.
Version-Release number of selected component (if applicable):
Tested on both 4.10.22 and 4.11.2
How reproducible:
To reproduce the issue: Create spot instance machine in gcp. Stop instance, notice in machine-api-termination-handler pod there is no signal in there signifying it was terminated. Note we do see on machines list the TERMINATED status. Result is that pods are not gracefully moved off in the 90sec window before node is turned off. We would expect a terminated node to wait for pods to move off (up to 90sec) and then shutdown, instead of an immediate shutdown of the node.
Steps to Reproduce:
1. Create spot instance machine in gcp. 2. Stop instance 3. Notice in machine-api-termination-handler pod there is no signal in there signifying it was terminated. 4. Note we do see on machines list the TERMINATED status. 5. Result is that pods are not gracefully moved off in the 90sec window before node is turned off.
Actual results:
The machine-api-termination-handler logs don't show any message such as "Instance marked for termination, marking Node for deletion" but instead no signal is received from GCP.
Expected results:
A terminated node should wait for pods to move off (up to 90sec) and then shutdown, instead of an immediate shutdown of the node.
Additional info:
Here is the code:
https://github.com/openshift/machine-api-provider-gcp/blob/main/pkg/termination/termination.go#L96-L127
#forum-cloud slack thread:
https://coreos.slack.com/archives/CBZHF4DHC/p1656524730323259
#forum-node slack thread:
https://coreos.slack.com/archives/CK1AE4ZCK/p1656619821630479
- clones
-
OCPBUGS-2117 [gcp] pre-emptible VM: machine-api-termination-handler not marking instance for deletion
- Closed