Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38592

vSphere machines are getting into provisioned status "expected 1 secret for SA 'windows-instance-config-daemon', found 2""

XMLWordPrintable

    • Moderate
    • Yes
    • 0
    • WINC - Sprint 258, WINC - Sprint 259
    • 2
    • False
    • Hide

      None

      Show
      None
    • Hide
      Fixes an issue where multiple service account token secrets for service accounts in the WMCO namespace would cause WMCO to error out. This issue was fixed by having WMCO only use the secret it creates, ignoring any other service account token secrets in the WMCO namespace.
      Show
      Fixes an issue where multiple service account token secrets for service accounts in the WMCO namespace would cause WMCO to error out. This issue was fixed by having WMCO only use the secret it creates, ignoring any other service account token secrets in the WMCO namespace.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-38485. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-37481. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-29253. The following is the description of the original issue:

      Description of problem:

          vSphere nodes are disappearing after scaling down and up some windows machines, the machines are in Provisioned state and can't get ready, nodes are not appearing anymore.

      Version-Release number of selected component (if applicable):

          10.15.0-ae56369
          4.15.0-0.nightly-2024-02-07-062935

      How reproducible:

      most likely    

      Steps to Reproduce:

          1. Install WMCO latest 10.15
          2. Create 2 machineset nodes on vSphere VCenter
          3. wait for nodes to be ready
          4. run scale down to 0 Windows machineset
          5. scale up back to 2 machines 
          

      Actual results:

          Nodes are not back after scaling up, the machines are stuck in provisioning, workloads are in pending state

      Expected results:

          Windows nodes should not disappear

      Additional info:

          wmco log: 
      {"level":"info","ts":"2024-02-08T13:07:28Z","logger":"controller.windowsmachine","msg":"processing","windowsmachine":{"name":"winworker-nwhr5","namespace":"openshift-machine-api"},"address":"192.168.221.139"}
      {"level":"error","ts":"2024-02-08T13:07:33Z","msg":"Reconciler error","controller":"machine","controllerGroup":"machine.openshift.io","controllerKind":"Machine","Machine":{"name":"winworker-nwhr5","namespace":"openshift-machine-api"},"namespace":"openshift-machine-api","name":"winworker-nwhr5","reconcileID":"0a012274-2921-436a-9ef9-75761208bdc2","error":"unable to configure instance 423d2593-9f52-2506-2c38-e0decef2b967: expected 1 secret for SA 'windows-instance-config-daemon', found 2","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
      
      Pod logs:
      
      oc describe pod/win-webserver-7c66c4b657-8v487
      Name:           win-webserver-7c66c4b657-8v487
      Namespace:      winc-test
      Priority:       0
      Node:           <none>
      Labels:         app=win-webserver
                      pod-template-hash=7c66c4b657
      Annotations:    <none>
      Status:         Pending
      IP:
      IPs:            <none>
      Controlled By:  ReplicaSet/win-webserver-7c66c4b657
      Containers:
        win-webserver:
          Image:      mcr.microsoft.com/powershell:lts-nanoserver-ltsc2022
          Port:       <none>
          Host Port:  <none>
          Command:
            pwsh.exe
            -command
            $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Windows Container Web Server</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); };
          Environment:  <none>
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t6sd6 (ro)
      Conditions:
        Type           Status
        PodScheduled   False
      Volumes:
        kube-api-access-t6sd6:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   BestEffort
      Node-Selectors:              kubernetes.io/os=windows
      Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
                                   os=Windows
      Events:
        Type     Reason            Age                  From               Message
        ----     ------            ----                 ----               -------
        Warning  FailedScheduling  176m                 default-scheduler  0/7 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) were unschedulable, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..
        Warning  FailedScheduling  171m                 default-scheduler  0/5 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
        Warning  FailedScheduling  15m (x31 over 165m)  default-scheduler  0/5 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
      
      oc get serviceaccount -n openshift-windows-machine-config-operator windows-instance-config-daemon -oyaml
      apiVersion: v1
      imagePullSecrets:
      - name: windows-instance-config-daemon-dockercfg-rtwcm
      kind: ServiceAccount
      metadata:
        creationTimestamp: "2024-02-08T09:26:25Z"
        labels:
          olm.managed: "true"
        name: windows-instance-config-daemon
        namespace: openshift-windows-machine-config-operator
        resourceVersion: "56050"
        uid: 5d561b3c-5623-49d9-b4a1-c4df6aac6568
      secrets:
      - name: windows-instance-config-daemon-dockercfg-rtwcm
      
      
      

            rh-ee-ssoto Sebastian Soto
            openshift-crt-jira-prow OpenShift Prow Bot
            Weinan Liu Weinan Liu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: