-
Bug
-
Resolution: Done-Errata
-
Major
-
4.16.0
Description of problem:
vSphere nodes are disappearing after scaling down and up some windows machines, the machines are in Provisioned state and can't get ready, nodes are not appearing anymore.
Version-Release number of selected component (if applicable):
10.15.0-ae56369 4.15.0-0.nightly-2024-02-07-062935
How reproducible:
most likely
Steps to Reproduce:
1. Install WMCO latest 10.15 2. Create 2 machineset nodes on vSphere VCenter 3. wait for nodes to be ready 4. run scale down to 0 Windows machineset 5. scale up back to 2 machines
Actual results:
Nodes are not back after scaling up, the machines are stuck in provisioning, workloads are in pending state
Expected results:
Windows nodes should not disappear
Additional info:
wmco log: {"level":"info","ts":"2024-02-08T13:07:28Z","logger":"controller.windowsmachine","msg":"processing","windowsmachine":{"name":"winworker-nwhr5","namespace":"openshift-machine-api"},"address":"192.168.221.139"} {"level":"error","ts":"2024-02-08T13:07:33Z","msg":"Reconciler error","controller":"machine","controllerGroup":"machine.openshift.io","controllerKind":"Machine","Machine":{"name":"winworker-nwhr5","namespace":"openshift-machine-api"},"namespace":"openshift-machine-api","name":"winworker-nwhr5","reconcileID":"0a012274-2921-436a-9ef9-75761208bdc2","error":"unable to configure instance 423d2593-9f52-2506-2c38-e0decef2b967: expected 1 secret for SA 'windows-instance-config-daemon', found 2","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} Pod logs: oc describe pod/win-webserver-7c66c4b657-8v487 Name: win-webserver-7c66c4b657-8v487 Namespace: winc-test Priority: 0 Node: <none> Labels: app=win-webserver pod-template-hash=7c66c4b657 Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/win-webserver-7c66c4b657 Containers: win-webserver: Image: mcr.microsoft.com/powershell:lts-nanoserver-ltsc2022 Port: <none> Host Port: <none> Command: pwsh.exe -command $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Windows Container Web Server</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); }; Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t6sd6 (ro) Conditions: Type Status PodScheduled False Volumes: kube-api-access-t6sd6: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: kubernetes.io/os=windows Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s os=Windows Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 176m default-scheduler 0/7 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) were unschedulable, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling.. Warning FailedScheduling 171m default-scheduler 0/5 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.. Warning FailedScheduling 15m (x31 over 165m) default-scheduler 0/5 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.. oc get serviceaccount -n openshift-windows-machine-config-operator windows-instance-config-daemon -oyaml apiVersion: v1 imagePullSecrets: - name: windows-instance-config-daemon-dockercfg-rtwcm kind: ServiceAccount metadata: creationTimestamp: "2024-02-08T09:26:25Z" labels: olm.managed: "true" name: windows-instance-config-daemon namespace: openshift-windows-machine-config-operator resourceVersion: "56050" uid: 5d561b3c-5623-49d9-b4a1-c4df6aac6568 secrets: - name: windows-instance-config-daemon-dockercfg-rtwcm
- blocks
-
OCPBUGS-37481 vSphere machines are getting into provisioned status "expected 1 secret for SA 'windows-instance-config-daemon', found 2""
- Closed
- is cloned by
-
OCPBUGS-37481 vSphere machines are getting into provisioned status "expected 1 secret for SA 'windows-instance-config-daemon', found 2""
- Closed
- links to
-
RHBA-2024:132594 Red Hat OpenShift for Windows Containers 10.17.0 product release
- mentioned on