-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.17.0, 4.18
-
None
-
No
-
3
-
WINC - Sprint 260, WINC - Sprint 261
-
2
-
False
-
Description of problem:
If the version annotation 'windowsmachineconfig.openshift.io/version' is removed from a node object, the node binaries (WICD, kubelet, etc.) may be stopped and not restarted.
Version-Release number of selected component (if applicable):
How reproducible:
Low %
Steps to Reproduce:
1. Remove the version annotation from a Windows node
Actual results:
The version annotation is re-added to the node, but the node is no longer ready and schedulable. WMCO logs contains a cleanup failure, and further reconciliations do not fix the node state: 2024-06-10T23:32:13Z INFO wc 10.0.19.87 failed to cleanup node {"command": "C:\\k\\windows-instance-config-daemon.exe cleanup --kubeconfig C:\\k\\wicd-kubeconfig --namespace openshift-windows-machine-config-operator", "output": "I0610 23:28:38.536824 7952 cleanup.go:132] error getting services ConfigMap associated with version annotation, falling back to use latest services ConfigMap: node is missing version annotation\nI0610 23:32:13.630884 7952 cleanup.go:197] removed services: [\"csi-proxy\" \"hybrid-overlay-node\" \"windows_exporter\" \"kubelet\" \"containerd\"]\nF0610 23:32:13.630884 7952 cleanup.go:51] []error{(*fmt.wrapError)(0xc000020020)}\n"} 2024-06-10T23:32:13Z ERROR Reconciler error {"controller": "machine", "controllerGroup": "machine.openshift.io", "controllerKind": "Machine", "Machine": {"name":"ci-op-igcd4qjs-9c393-ndx27-e2e-wm-b4vnt","namespace":"openshift-machine-api"}, "namespace": "openshift-machine-api", "name": "ci-op-igcd4qjs-9c393-ndx27-e2e-wm-b4vnt", "reconcileID": "7ecf387a-befa-48c0-90cf-739f28a2be7d", "error": "unable to configure instance i-057fb07bfc7ed073e: bootstrapping the Windows instance failed: unable to cleanup the Windows instance: error running powershell.exe -NonInteractive -ExecutionPolicy Bypass \"C:\\k\\windows-instance-config-daemon.exe cleanup --kubeconfig C:\\k\\wicd-kubeconfig --namespace openshift-windows-machine-config-operator\": Process exited with status 1"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:261 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222 2024-06-10T23:32:13Z DEBUG controller.windowsmachine reconciling {"windowsmachine": {"name":"ci-op-igcd4qjs-9c393-ndx27-e2e-wm-b4vnt","namespace":"openshift-machine-api"}} 2024-06-10T23:32:13Z DEBUG events Machine ci-op-igcd4qjs-9c393-ndx27-e2e-wm-b4vnt configuration failure {"type": "Warning", "object": {"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-igcd4qjs-9c393-ndx27-e2e-wm-b4vnt","uid":"91c21fe8-154d-40ee-bdba-a34b2235a316","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49707"}, "reason": "MachineSetupFailure"} 2024-06-10T23:32:28Z DEBUG controller.windowsmachine reconciling {"windowsmachine": {"name":"ci-op-igcd4qjs-9c393-ndx27-e2e-wm-b4vnt","namespace":"openshift-machine-api"}} 2024-06-10T23:32:58Z DEBUG controller.windowsmachine reconciling {"windowsmachine": {"name":"ci-op-igcd4qjs-9c393-ndx27-e2e-wm-b4vnt","namespace":"openshift-machine-api"}}
Expected results:
The version annotation is re-added to the node, and the node maintains functionality.
Additional info:
Potential cause: - User removes version annotation - WMCO decides node is not up to date, and tries to configure it - WICD decides node is up to date, and re-applies the version annotation - WMCO stops WICD - WMCO runs WICD cleanup - WICD cleanup fails, resulting in all node binaries being stopped - WMCO restarts reconciliation after backoff time - WMCO sees that the version annotation is correct, and decides the node does not need to be configured. - Node binaries remain stopped
- blocks
-
OCPBUGS-43573 Version annotation removal results in unusable node
- Closed
- causes
-
WINC-1343 Track and update version of AWS EC2Launch v2 agent
- To Do
- is cloned by
-
OCPBUGS-43573 Version annotation removal results in unusable node
- Closed
- is duplicated by
-
OCPBUGS-37797 error removing %s HNS network on BYOH deconfiguring error getting services ConfigMap associated with version annotation
- Closed
-
OCPBUGS-37798 error removing %s HNS network on BYOH deconfiguring error getting services ConfigMap associated with version annotation
- Closed
- links to
-
RHBA-2024:137899 Red Hat OpenShift for Windows Containers 10.18.0 product release
- mentioned on