-
Bug
-
Resolution: Done-Errata
-
Major
-
4.16.0
Description of problem:
vSphere nodes are disappearing after scaling down and up some windows machines, the machines are in Provisioned state and can't get ready, nodes are not appearing anymore.
Version-Release number of selected component (if applicable):
10.15.0-ae56369 4.15.0-0.nightly-2024-02-07-062935
How reproducible:
most likely
Steps to Reproduce:
1. Install WMCO latest 10.15
2. Create 2 machineset nodes on vSphere VCenter
3. wait for nodes to be ready
4. run scale down to 0 Windows machineset
5. scale up back to 2 machines
Actual results:
Nodes are not back after scaling up, the machines are stuck in provisioning, workloads are in pending state
Expected results:
Windows nodes should not disappear
Additional info:
wmco log:
{"level":"info","ts":"2024-02-08T13:07:28Z","logger":"controller.windowsmachine","msg":"processing","windowsmachine":{"name":"winworker-nwhr5","namespace":"openshift-machine-api"},"address":"192.168.221.139"}
{"level":"error","ts":"2024-02-08T13:07:33Z","msg":"Reconciler error","controller":"machine","controllerGroup":"machine.openshift.io","controllerKind":"Machine","Machine":{"name":"winworker-nwhr5","namespace":"openshift-machine-api"},"namespace":"openshift-machine-api","name":"winworker-nwhr5","reconcileID":"0a012274-2921-436a-9ef9-75761208bdc2","error":"unable to configure instance 423d2593-9f52-2506-2c38-e0decef2b967: expected 1 secret for SA 'windows-instance-config-daemon', found 2","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
Pod logs:
oc describe pod/win-webserver-7c66c4b657-8v487
Name: win-webserver-7c66c4b657-8v487
Namespace: winc-test
Priority: 0
Node: <none>
Labels: app=win-webserver
pod-template-hash=7c66c4b657
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/win-webserver-7c66c4b657
Containers:
win-webserver:
Image: mcr.microsoft.com/powershell:lts-nanoserver-ltsc2022
Port: <none>
Host Port: <none>
Command:
pwsh.exe
-command
$listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Windows Container Web Server</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); };
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t6sd6 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-t6sd6:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=windows
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
os=Windows
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 176m default-scheduler 0/7 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) were unschedulable, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..
Warning FailedScheduling 171m default-scheduler 0/5 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
Warning FailedScheduling 15m (x31 over 165m) default-scheduler 0/5 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
oc get serviceaccount -n openshift-windows-machine-config-operator windows-instance-config-daemon -oyaml
apiVersion: v1
imagePullSecrets:
- name: windows-instance-config-daemon-dockercfg-rtwcm
kind: ServiceAccount
metadata:
creationTimestamp: "2024-02-08T09:26:25Z"
labels:
olm.managed: "true"
name: windows-instance-config-daemon
namespace: openshift-windows-machine-config-operator
resourceVersion: "56050"
uid: 5d561b3c-5623-49d9-b4a1-c4df6aac6568
secrets:
- name: windows-instance-config-daemon-dockercfg-rtwcm
- blocks
-
OCPBUGS-37481 vSphere machines are getting into provisioned status "expected 1 secret for SA 'windows-instance-config-daemon', found 2""
-
- Closed
-
- is cloned by
-
OCPBUGS-37481 vSphere machines are getting into provisioned status "expected 1 secret for SA 'windows-instance-config-daemon', found 2""
-
- Closed
-
- links to
-
RHBA-2024:132594
Red Hat OpenShift for Windows Containers 10.17.0 product release
- mentioned on