Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: 4.14.z
Affects Version/s: 4.16.0
Component/s: Windows Containers
Labels:
- regression

Severity:
Moderate
Regression:
Yes
Story Points:
0
Sprint:
WINC - Sprint 258, WINC - Sprint 259
sprint_count:
2
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
Fixes an issue where multiple service account token secrets for service accounts in the WMCO namespace would cause WMCO to error out. This issue was fixed by having WMCO only use the secret it creates, ignoring any other service account token secrets in the WMCO namespace.

Show
Fixes an issue where multiple service account token secrets for service accounts in the WMCO namespace would cause WMCO to error out. This issue was fixed by having WMCO only use the secret it creates, ignoring any other service account token secrets in the WMCO namespace.
Release Note Type:
Bug Fix
Release Note Status:
In Progress
Target Version:

4.14.z
Target Backport Versions:

4.16.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue ~~OCPBUGS-38485~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-37481~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-29253~~. The following is the description of the original issue:
—
Description of problem:

    vSphere nodes are disappearing after scaling down and up some windows machines, the machines are in Provisioned state and can't get ready, nodes are not appearing anymore.

Version-Release number of selected component (if applicable):

    10.15.0-ae56369
    4.15.0-0.nightly-2024-02-07-062935

How reproducible:

most likely

Steps to Reproduce:

    1. Install WMCO latest 10.15
    2. Create 2 machineset nodes on vSphere VCenter
    3. wait for nodes to be ready
    4. run scale down to 0 Windows machineset
    5. scale up back to 2 machines

Actual results:

    Nodes are not back after scaling up, the machines are stuck in provisioning, workloads are in pending state

Expected results:

    Windows nodes should not disappear

Additional info:

    wmco log: 
{"level":"info","ts":"2024-02-08T13:07:28Z","logger":"controller.windowsmachine","msg":"processing","windowsmachine":{"name":"winworker-nwhr5","namespace":"openshift-machine-api"},"address":"192.168.221.139"}
{"level":"error","ts":"2024-02-08T13:07:33Z","msg":"Reconciler error","controller":"machine","controllerGroup":"machine.openshift.io","controllerKind":"Machine","Machine":{"name":"winworker-nwhr5","namespace":"openshift-machine-api"},"namespace":"openshift-machine-api","name":"winworker-nwhr5","reconcileID":"0a012274-2921-436a-9ef9-75761208bdc2","error":"unable to configure instance 423d2593-9f52-2506-2c38-e0decef2b967: expected 1 secret for SA 'windows-instance-config-daemon', found 2","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}

Pod logs:

oc describe pod/win-webserver-7c66c4b657-8v487
Name:           win-webserver-7c66c4b657-8v487
Namespace:      winc-test
Priority:       0
Node:           <none>
Labels:         app=win-webserver
                pod-template-hash=7c66c4b657
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/win-webserver-7c66c4b657
Containers:
  win-webserver:
    Image:      mcr.microsoft.com/powershell:lts-nanoserver-ltsc2022
    Port:       <none>
    Host Port:  <none>
    Command:
      pwsh.exe
      -command
      $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Windows Container Web Server</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); };
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-t6sd6 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  kube-api-access-t6sd6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=windows
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
                             os=Windows
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  176m                 default-scheduler  0/7 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 2 node(s) were unschedulable, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  171m                 default-scheduler  0/5 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  15m (x31 over 165m)  default-scheduler  0/5 nodes are available: 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..

oc get serviceaccount -n openshift-windows-machine-config-operator windows-instance-config-daemon -oyaml
apiVersion: v1
imagePullSecrets:
- name: windows-instance-config-daemon-dockercfg-rtwcm
kind: ServiceAccount
metadata:
  creationTimestamp: "2024-02-08T09:26:25Z"
  labels:
    olm.managed: "true"
  name: windows-instance-config-daemon
  namespace: openshift-windows-machine-config-operator
  resourceVersion: "56050"
  uid: 5d561b3c-5623-49d9-b4a1-c4df6aac6568
secrets:
- name: windows-instance-config-daemon-dockercfg-rtwcm

clones

OCPBUGS-38485 vSphere machines are getting into provisioned status "expected 1 secret for SA 'windows-instance-config-daemon', found 2""

Closed

is blocked by

OCPBUGS-38485 vSphere machines are getting into provisioned status "expected 1 secret for SA 'windows-instance-config-daemon', found 2""

Closed

links to

openshift/windows-machine-config-operator#2381: [release-4.14] OCPBUGS-38592: Always use created WICD SA token

RHBA-2024:132229 Red Hat OpenShift for Windows Containers 9.0.3 product release

mentioned on

Merge request - Updated US source to: 1c750ab Merge pull request #2381 from openshift-cherrypick-robot/cherry-pick-2376-to-release-4.14

Assignee:: Sebastian Soto

Reporter:: OpenShift Prow Bot

QA Contact:: Weinan Liu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/08/16 1:03 PM

Updated:: 2024/09/09 12:29 AM

Resolved:: 2024/09/09 12:29 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates