Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9356

Windows nodes' kubelet cannot run with --cloud-provider=external after migrate to CCM

XMLWordPrintable

    • Important
    • None
    • 3
    • Rejected
    • Unspecified
    • If docs needed, set a value

      Description of problem:
      Install a fresh cluster, add windows worker, then enable ccm, Check Windows nodes' kubelet cannot run with --cloud-provider=external
      But if install a fresh cluster with ccm, then add windows worker, Check Windows nodes' kubelet run with --cloud-provider=external as expected

      Version-Release number of selected component (if applicable):
      4.11.0-0.nightly-2022-06-30-005428

      How reproducible:
      Always

      Steps to Reproduce:
      1. Install a fresh cluster, add windows worker
      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-0.nightly-2022-06-30-005428 True False 41m Cluster version is 4.11.0-0.nightly-2022-06-30-005428
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      huliu-azure71a-4wmh2-master-0 Ready master 71m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-master-1 Ready master 71m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-master-2 Ready master 71m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-worker-southcentralus1-9d6lb Ready worker 57m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-worker-southcentralus2-mqpqq Ready worker 54m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-worker-southcentralus3-md7pc Ready worker 57m v1.24.0+9ddc8b1
      windows-bgpxw Ready worker 27m v1.24.0-2323+01aa0f3f6052c9
      windows-dz85l Ready worker 21m v1.24.0-2323+01aa0f3f6052c9

      2. enable ccm
      liuhuali@Lius-MacBook-Pro huali-test % oc edit featuregate
      featuregate.config.openshift.io/cluster edited
      liuhuali@Lius-MacBook-Pro huali-test % oc get deploy -n openshift-cloud-controller-manager
      NAME READY UP-TO-DATE AVAILABLE AGE
      azure-cloud-controller-manager 2/2 2 2 10m
      liuhuali@Lius-MacBook-Pro huali-test % oc get pod -n openshift-cloud-controller-manager
      NAME READY STATUS RESTARTS AGE
      azure-cloud-controller-manager-5946ff4bb9-6hc5k 1/1 Running 0 10m
      azure-cloud-controller-manager-5946ff4bb9-qdsb7 1/1 Running 0 10m
      azure-cloud-node-manager-62srs 1/1 Running 0 9m44s
      azure-cloud-node-manager-72gjm 1/1 Running 0 10m
      azure-cloud-node-manager-k4bdb 1/1 Running 0 6m50s
      azure-cloud-node-manager-tlk4c 1/1 Running 0 10m
      azure-cloud-node-manager-tpwhc 1/1 Running 0 10m
      azure-cloud-node-manager-vpgw6 1/1 Running 0 10m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      huliu-azure71a-4wmh2-master-0 Ready master 99m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-master-1 Ready master 98m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-master-2 Ready master 99m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-worker-southcentralus1-9d6lb Ready worker 84m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-worker-southcentralus2-mqpqq Ready worker 81m v1.24.0+9ddc8b1
      huliu-azure71a-4wmh2-worker-southcentralus3-md7pc Ready worker 84m v1.24.0+9ddc8b1
      windows-bgpxw Ready worker 54m v1.24.0-2323+01aa0f3f6052c9
      windows-dz85l Ready worker 48m v1.24.0-2323+01aa0f3f6052c9

      3. Ssh to windows node
      liuhuali@Lius-MacBook-Pro huali-test % oc debug node/huliu-azure71a-4wmh2-master-0
      W0701 11:40:06.697694 61245 warnings.go:70] would violate PodSecurity "restricted:v1.24": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
      Starting pod/huliu-azure71a-4wmh2-master-0-debug ...
      To use host binaries, run `chroot /host`
      Pod IP: 10.0.0.7
      If you don't see a command prompt, try pressing enter.
      sh-4.4# chroot /host
      sh-4.4# cd ~
      sh-4.4# ssh -i /tmp/openshift-qe.pem capi@10.0.128.7 powershell
      Windows PowerShell
      Copyright (C) Microsoft Corporation. All rights reserved.

      PS C:\Users\capi> Get-Item -path HKLM:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kubelet
      Get-Item -path HKLM:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kubelet

      Hive: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services

      Name Property
      ---- --------
      kubelet Type : 16
      Start : 2
      ErrorControl : 1
      ImagePath : c:\k\kubelet.exe --config=c:\k\kubelet.conf
      --bootstrap-kubeconfig=c:\k\bootstrap-kubeconfig
      --kubeconfig=c:\k\kubeconfig --cert-dir=c:\var\lib\kubelet\pki\
      --windows-service
      --logtostderr=false --log-file=C:\var\log\kubelet\kubelet.log
      --register-with-taints=os=Windows:NoSchedule
      --node-labels=node.openshift.io/os_id=Windows
      --container-runtime=remote
      --container-runtime-endpoint=npipe://./pipe/containerd-containerd
      --resolv-conf= --cloud-provider=azure --v=3
      --cloud-config=c:\k\cloud.conf
      DependOnService :

      {containerd}
      ObjectName : LocalSystem
      Description : OpenShift managed kubelet
      FailureActions : {88, 2, 0, 0...}


      PS C:\Users\capi> Get-Service cloud-node-manager
      Get-Service cloud-node-manager
      Get-Service : Cannot find any service with service name 'cloud-node-manager'.
      At line:1 char:1
      + Get-Service cloud-node-manager
      + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      + CategoryInfo : ObjectNotFound: (cloud-node-manager:String) [Get-Service], ServiceCommandException
      + FullyQualifiedErrorId : NoServiceFoundForGivenName,Microsoft.PowerShell.Commands.GetServiceCommand

      PS C:\Users\capi>


      Actual results:
      kubelet run with --cloud-provider=azure; no cloud-node-manager service.

      Expected results:
      kubelet run with --cloud-provider=external; Should have cloud-node-manager.

      Additional info:
      Checked on aws(4.11.0-0.nightly-2022-06-30-005428), vsphere(4.11.0-0.nightly-2022-06-30-005428), azure(4.10.0-fc.0, 4.10.0-0.nightly-2022-06-08-150219, 4.11.0-0.nightly-2022-06-30-005428), all can reproduce this issue.

      Also checked on aws(4.11.0-0.nightly-2022-06-30-005428), vsphere(4.11.0-0.nightly-2022-06-30-005428), azure(4.11.0-0.nightly-2022-06-30-005428), install a fresh cluster with ccm, then add windows worker, Check Windows nodes' kubelet run with --cloud-provider=external as expected.

      PS C:\Users\capi> Get-Service cloud-node-manager
      Get-Service cloud-node-manager

      Status Name DisplayName
      ------ ---- -----------
      Running cloud-node-manager cloud-node-manager


      PS C:\Users\capi> Get-Item -path HKLM:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kubelet
      Get-Item -path HKLM:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kubelet


      Hive: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services


      Name Property
      ---- --------
      kubelet Type : 16
      Start : 2
      ErrorControl : 1
      ImagePath : c:\k\kubelet.exe --config=c:\k\kubelet.conf
      --bootstrap-kubeconfig=c:\k\bootstrap-kubeconfig
      --kubeconfig=c:\k\kubeconfig --cert-dir=c:\var\lib\kubelet\pki\
      --windows-service
      --logtostderr=false --log-file=C:\var\log\kubelet\kubelet.log
      --register-with-taints=os=Windows:NoSchedule
      --node-labels=node.openshift.io/os_id=Windows
      --container-runtime=remote
      --container-runtime-endpoint=npipe://./pipe/containerd-containerd
      --resolv-conf= --cloud-provider=external --v=3
      DependOnService : {containerd}


      ObjectName : LocalSystem
      Description : OpenShift managed kubelet
      FailureActions :

      {88, 2, 0, 0...}

      PS C:\Users\capi>

      Must-gather:
      azure(install a fresh cluster, add windows worker, then enable ccm) - https://drive.google.com/file/d/1N2InQFe_mDIqayfUCqMyP-U8OE-2wss2/view?usp=sharing
      azure(install a fresh cluster with ccm, then add windows worker) - https://drive.google.com/file/d/1iHR9LzQuCmwtRxsVz6oBMGCCJrJFwTYC/view?usp=sharing

            team-winc Team WinC
            huliu@redhat.com Huali Liu
            Aharon Rasouli Aharon Rasouli
            Red Hat Employee
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: