-
Bug
-
Resolution: Duplicate
-
Critical
-
4.13, 4.14
-
None
-
Critical
-
No
-
0
-
Rejected
-
False
-
This is a clone of issue OCPBUGS-13780. The following is the description of the original issue:
—
Description of problem:
Having a 4.14 OCP cluster with BYOH and Machine Windows Containers nodes, when modifying the the version annotation windowsmachineconfig.openshift.io/version. If the version set does not exist, for example: oc annotate node ip-10-0-148-206.us-east-2.compute.internal --overwrite windowsmachineconfig.openshift.io/version=invalidVersion The impacted BYOH/Machine node will hang on Ready,Scheduling disabled and won't be able to leave that state as WMCO is trying to allocate the configmap windows-services-invalidVersion, which does not exist: [jfrancoa@localhost openshift-tests-private]$ oc get cm windows-instances -n openshift-windows-machine-config-operator -o yaml apiVersion: v1 data: 10.0.148.206: username=Administrator kind: ConfigMap metadata: creationTimestamp: "2023-05-18T07:31:13Z" name: windows-instances namespace: openshift-windows-machine-config-operator resourceVersion: "60960" uid: ab745d75-6f53-4919-b930-4edc88e5016d [jfrancoa@localhost openshift-tests-private]$ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-136-0.us-east-2.compute.internal Ready control-plane,master 135m v1.27.1+20a4409 10.0.136.0 <none> Red Hat Enterprise Linux CoreOS 414.92.202305162029-0 (Plow) 5.14.0-284.13.1.el9_2.x86_64 cri-o://1.27.0-6.rhaos4.14.git81ac4ce.el9 ip-10-0-137-97.us-east-2.compute.internal Ready worker 88m v1.26.2+0f23833 10.0.137.97 <none> Windows Server 2019 Datacenter 10.0.17763.4252 containerd://1.7.0 ip-10-0-139-10.us-east-2.compute.internal Ready worker 126m v1.27.1+20a4409 10.0.139.10 <none> Red Hat Enterprise Linux CoreOS 414.92.202305162029-0 (Plow) 5.14.0-284.13.1.el9_2.x86_64 cri-o://1.27.0-6.rhaos4.14.git81ac4ce.el9 ip-10-0-148-206.us-east-2.compute.internal Ready,SchedulingDisabled worker 23m v1.26.2+0f23833 10.0.148.206 <none> Windows Server 2019 Datacenter 10.0.17763.4252 containerd://1.7.0 ip-10-0-149-147.us-east-2.compute.internal Ready worker 83m v1.26.2+0f23833 10.0.149.147 <none> Windows Server 2019 Datacenter 10.0.17763.4252 containerd://1.7.0 ip-10-0-185-94.us-east-2.compute.internal Ready worker 127m v1.27.1+20a4409 10.0.185.94 <none> Red Hat Enterprise Linux CoreOS 414.92.202305162029-0 (Plow) 5.14.0-284.13.1.el9_2.x86_64 cri-o://1.27.0-6.rhaos4.14.git81ac4ce.el9 ip-10-0-191-102.us-east-2.compute.internal Ready control-plane,master 134m v1.27.1+20a4409 10.0.191.102 <none> Red Hat Enterprise Linux CoreOS 414.92.202305162029-0 (Plow) 5.14.0-284.13.1.el9_2.x86_64 cri-o://1.27.0-6.rhaos4.14.git81ac4ce.el9 ip-10-0-212-18.us-east-2.compute.internal Ready worker 124m v1.27.1+20a4409 10.0.212.18 <none> Red Hat Enterprise Linux CoreOS 414.92.202305162029-0 (Plow) 5.14.0-284.13.1.el9_2.x86_64 cri-o://1.27.0-6.rhaos4.14.git81ac4ce.el9 ip-10-0-213-56.us-east-2.compute.internal Ready control-plane,master 134m v1.27.1+20a4409 10.0.213.56 <none> Red Hat Enterprise Linux CoreOS 414.92.202305162029-0 (Plow) 5.14.0-284.13.1.el9_2.x86_64 cri-o://1.27.0-6.rhaos4.14.git81ac4ce.el9 WMCO logs: {"level":"info","ts":"2023-05-18T07:44:13Z","logger":"nc 10.0.148.206","msg":"instance has been configured as a worker node","version":"9.0.0-0ecb2e1"} {"level":"info","ts":"2023-05-18T07:44:13Z","logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"} {"level":"info","ts":"2023-05-18T07:45:13Z","logger":"controllers.configmap","msg":"processing","instances in":"windows-instances"} {"level":"info","ts":"2023-05-18T07:45:13Z","logger":"controllers.configmap","msg":"instance is up to date","node":"ip-10-0-148-206.us-east-2.compute.internal","version":"9.0.0-0ecb2e1"} {"level":"info","ts":"2023-05-18T07:45:13Z","logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"} {"level":"info","ts":"2023-05-18T07:46:05Z","logger":"controllers.configmap","msg":"processing","instances in":"windows-instances"} {"level":"info","ts":"2023-05-18T07:46:06Z","logger":"controllers.configmap","msg":"instance requires upgrade","node":"ip-10-0-148-206.us-east-2.compute.internal","version":"invalidVersion","expected version":"9.0.0-0ecb2e1"} {"level":"info","ts":"2023-05-18T07:46:16Z","logger":"nc 10.0.148.206","msg":"evicting pod winc-42484/win-webserver-768b7bc78d-smjhc\n"} {"level":"info","ts":"2023-05-18T07:46:16Z","logger":"nc 10.0.148.206","msg":"evicting pod winc-42484/win-webserver-768b7bc78d-2vwxs\n"} {"level":"info","ts":"2023-05-18T07:46:16Z","logger":"nc 10.0.148.206","msg":"evicting pod winc-42484/win-webserver-768b7bc78d-857wc\n"} {"level":"info","ts":"2023-05-18T07:46:16Z","logger":"nc 10.0.148.206","msg":"evicting pod winc-42484/win-webserver-768b7bc78d-gq7g5\n"} {"level":"info","ts":"2023-05-18T07:46:16Z","logger":"nc 10.0.148.206","msg":"evicting pod winc-42484/win-webserver-768b7bc78d-pzkdd\n"} {"level":"info","ts":"2023-05-18T07:46:16Z","logger":"wc 10.0.148.206","msg":"deconfiguring"} {"level":"info","ts":"2023-05-18T07:46:47Z","logger":"wc 10.0.148.206","msg":"deconfigured","service":"windows-instance-config-daemon"} {"level":"error","ts":"2023-05-18T07:46:52Z","logger":"wc 10.0.148.206","msg":"error running","cmd":"powershell.exe -NonInteractive -ExecutionPolicy Bypass \"C:\\k\\windows-instance-config-daemon.exe cleanup --kubeconfig C:\\k\\wicd-kubeconfig --namespace openshi ft-windows-machine-config-operator\"","out":"F0518 07:46:52.825267 2756 cleanup.go:51] configmaps \"windows-services-invalidVersion\" not found\n","error":"Process exited with status 1","stacktrace":"github.com/openshift/windows-machine-config-operator/pkg/win dows.(*windows).Run\n\t/remote-source/build/windows-machine-config-operator/pkg/windows/windows.go:381\ngithub.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).RunWICDCleanup\n\t/remote-source/build/windows-machine-config-operator/pkg/windows/ windows.go:408\ngithub.com/openshift/windows-machine-config-operator/pkg/windows.(*windows).Deconfigure\n\t/remote-source/build/windows-machine-config-operator/pkg/windows/windows.go:418\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeC onfig).Deconfigure\n\t/remote-source/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:485\ngithub.com/openshift/windows-machine-config-operator/controllers.(*instanceReconciler).ensureInstanceIsUpToDate\n\t/remote-source/build/windows-machine-co nfig-operator/controllers/controllers.go:79\ngithub.com/openshift/windows-machine-config-operator/controllers.(*ConfigMapReconciler).ensureInstancesAreUpToDate\n\t/remote-source/build/windows-machine-config-operator/controllers/configmap_controller.go:314\ngithub .com/openshift/windows-machine-config-operator/controllers.(*ConfigMapReconciler).reconcileNodes\n\t/remote-source/build/windows-machine-config-operator/controllers/configmap_controller.go:279\ngithub.com/openshift/windows-machine-config-operator/controllers.(*Co nfigMapReconciler).Reconcile\n\t/remote-source/build/windows-machine-config-operator/controllers/configmap_controller.go:189\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/build/windows-machine-config-operator/ve ndor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/ internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274 \nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"} {"level":"info","ts":"2023-05-18T07:46:52Z","logger":"wc 10.0.148.206","msg":"failed to cleanup node","command":"C:\\k\\windows-instance-config-daemon.exe cleanup --kubeconfig C:\\k\\wicd-kubeconfig --namespace openshift-windows-machine-config-operator","output": "F0518 07:46:52.825267 2756 cleanup.go:51] configmaps \"windows-services-invalidVersion\" not found\n"} {"level":"error","ts":"2023-05-18T07:46:52Z","msg":"Reconciler error","controller":"configmap","controllerGroup":"","controllerKind":"ConfigMap","ConfigMap":{"name":"windows-instances","namespace":"openshift-windows-machine-config-operator"},"namespace":"openshif t-windows-machine-config-operator","name":"windows-instances","reconcileID":"6cf1733d-f108-43b9-832e-ee4400e00eef","error":"error configuring host with address 10.0.148.206: error deconfiguring instance: unable to cleanup the Windows instance: error running power shell.exe -NonInteractive -ExecutionPolicy Bypass \"C:\\k\\windows-instance-config-daemon.exe cleanup --kubeconfig C:\\k\\wicd-kubeconfig --namespace openshift-windows-machine-config-operator\": Process exited with status 1","stacktrace":"sigs.k8s.io/controller-r untime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Control ler).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/ build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"} {"level":"info","ts":"2023-05-18T07:46:52Z","logger":"controllers.configmap","msg":"processing","instances in":"windows-instances"} {"level":"info","ts":"2023-05-18T07:46:53Z","logger":"controllers.configmap","msg":"instance requires upgrade","node":"ip-10-0-148-206.us-east-2.compute.internal","version":"invalidVersion","expected version":"9.0.0-0ecb2e1"}
Version-Release number of selected component (if applicable):
[jfrancoa@localhost openshift-tests-private]$ oc get cm -n openshift-windows-machine-config-operator NAME DATA AGE kube-root-ca.crt 1 102m openshift-service-ca.crt 1 102m windows-instances 1 35m windows-machine-config-operator-lock 0 101m windows-services-9.0.0-0ecb2e1 2 101m [jfrancoa@localhost openshift-tests-private]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-05-18-040905 True False 113m Cluster version is 4.14.0-0.nightly-2023-05-18-040905
How reproducible:
Always
Steps to Reproduce:
1. Deploy an OCP 4.14 cluster. Install WMCO on it. 2. Create a BYOH or Machine Windows node 3. Modify the version annotation and set it to invalidVersion: oc annotate node ip-10-0-148-206.us-east-2.compute.internal --overwrite windowsmachineconfig.openshift.io/version=invalidVersion 4. Wait for the node to reconcile
Actual results:
The node does not reconcile and hangs in Ready,SchedulingDisabled for ever
Expected results:
The node reconciles and restores back the invalid version annotation.
Additional info:
This is a regression. This functionality was working in all previous versions.
- clones
-
OCPBUGS-13780 Modifying Windows node's annotation windowsmachineconfig.openshift.io/version ends up in Ready,SchedulingDisabled
- Closed
- is blocked by
-
OCPBUGS-13780 Modifying Windows node's annotation windowsmachineconfig.openshift.io/version ends up in Ready,SchedulingDisabled
- Closed
- links to