-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.11
-
Important
-
None
-
3
-
Rejected
-
Unspecified
-
If docs needed, set a value
Description of problem:
Install a fresh cluster, add windows worker, then enable ccm, Check Windows nodes' kubelet cannot run with --cloud-provider=external
But if install a fresh cluster with ccm, then add windows worker, Check Windows nodes' kubelet run with --cloud-provider=external as expected
Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-06-30-005428
How reproducible:
Always
Steps to Reproduce:
1. Install a fresh cluster, add windows worker
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-06-30-005428 True False 41m Cluster version is 4.11.0-0.nightly-2022-06-30-005428
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
huliu-azure71a-4wmh2-master-0 Ready master 71m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-master-1 Ready master 71m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-master-2 Ready master 71m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-worker-southcentralus1-9d6lb Ready worker 57m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-worker-southcentralus2-mqpqq Ready worker 54m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-worker-southcentralus3-md7pc Ready worker 57m v1.24.0+9ddc8b1
windows-bgpxw Ready worker 27m v1.24.0-2323+01aa0f3f6052c9
windows-dz85l Ready worker 21m v1.24.0-2323+01aa0f3f6052c9
2. enable ccm
liuhuali@Lius-MacBook-Pro huali-test % oc edit featuregate
featuregate.config.openshift.io/cluster edited
liuhuali@Lius-MacBook-Pro huali-test % oc get deploy -n openshift-cloud-controller-manager
NAME READY UP-TO-DATE AVAILABLE AGE
azure-cloud-controller-manager 2/2 2 2 10m
liuhuali@Lius-MacBook-Pro huali-test % oc get pod -n openshift-cloud-controller-manager
NAME READY STATUS RESTARTS AGE
azure-cloud-controller-manager-5946ff4bb9-6hc5k 1/1 Running 0 10m
azure-cloud-controller-manager-5946ff4bb9-qdsb7 1/1 Running 0 10m
azure-cloud-node-manager-62srs 1/1 Running 0 9m44s
azure-cloud-node-manager-72gjm 1/1 Running 0 10m
azure-cloud-node-manager-k4bdb 1/1 Running 0 6m50s
azure-cloud-node-manager-tlk4c 1/1 Running 0 10m
azure-cloud-node-manager-tpwhc 1/1 Running 0 10m
azure-cloud-node-manager-vpgw6 1/1 Running 0 10m
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
huliu-azure71a-4wmh2-master-0 Ready master 99m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-master-1 Ready master 98m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-master-2 Ready master 99m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-worker-southcentralus1-9d6lb Ready worker 84m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-worker-southcentralus2-mqpqq Ready worker 81m v1.24.0+9ddc8b1
huliu-azure71a-4wmh2-worker-southcentralus3-md7pc Ready worker 84m v1.24.0+9ddc8b1
windows-bgpxw Ready worker 54m v1.24.0-2323+01aa0f3f6052c9
windows-dz85l Ready worker 48m v1.24.0-2323+01aa0f3f6052c9
3. Ssh to windows node
liuhuali@Lius-MacBook-Pro huali-test % oc debug node/huliu-azure71a-4wmh2-master-0
W0701 11:40:06.697694 61245 warnings.go:70] would violate PodSecurity "restricted:v1.24": host namespaces (hostNetwork=true, hostPID=true), privileged (container "container-00" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "container-00" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "container-00" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "host" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or container "container-00" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "container-00" must not set runAsUser=0), seccompProfile (pod or container "container-00" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Starting pod/huliu-azure71a-4wmh2-master-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.7
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# cd ~
sh-4.4# ssh -i /tmp/openshift-qe.pem capi@10.0.128.7 powershell
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.
PS C:\Users\capi> Get-Item -path HKLM:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kubelet
Get-Item -path HKLM:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kubelet
Hive: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services
Name Property
---- --------
kubelet Type : 16
Start : 2
ErrorControl : 1
ImagePath : c:\k\kubelet.exe --config=c:\k\kubelet.conf
--bootstrap-kubeconfig=c:\k\bootstrap-kubeconfig
--kubeconfig=c:\k\kubeconfig --cert-dir=c:\var\lib\kubelet\pki\
--windows-service
--logtostderr=false --log-file=C:\var\log\kubelet\kubelet.log
--register-with-taints=os=Windows:NoSchedule
--node-labels=node.openshift.io/os_id=Windows
--container-runtime=remote
--container-runtime-endpoint=npipe://./pipe/containerd-containerd
--resolv-conf= --cloud-provider=azure --v=3
--cloud-config=c:\k\cloud.conf
DependOnService :
ObjectName : LocalSystem
Description : OpenShift managed kubelet
FailureActions : {88, 2, 0, 0...}
PS C:\Users\capi> Get-Service cloud-node-manager
Get-Service cloud-node-manager
Get-Service : Cannot find any service with service name 'cloud-node-manager'.
At line:1 char:1
+ Get-Service cloud-node-manager
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (cloud-node-manager:String) [Get-Service], ServiceCommandException
+ FullyQualifiedErrorId : NoServiceFoundForGivenName,Microsoft.PowerShell.Commands.GetServiceCommand
PS C:\Users\capi>
Actual results:
kubelet run with --cloud-provider=azure; no cloud-node-manager service.
Expected results:
kubelet run with --cloud-provider=external; Should have cloud-node-manager.
Additional info:
Checked on aws(4.11.0-0.nightly-2022-06-30-005428), vsphere(4.11.0-0.nightly-2022-06-30-005428), azure(4.10.0-fc.0, 4.10.0-0.nightly-2022-06-08-150219, 4.11.0-0.nightly-2022-06-30-005428), all can reproduce this issue.
Also checked on aws(4.11.0-0.nightly-2022-06-30-005428), vsphere(4.11.0-0.nightly-2022-06-30-005428), azure(4.11.0-0.nightly-2022-06-30-005428), install a fresh cluster with ccm, then add windows worker, Check Windows nodes' kubelet run with --cloud-provider=external as expected.
PS C:\Users\capi> Get-Service cloud-node-manager
Get-Service cloud-node-manager
Status Name DisplayName
------ ---- -----------
Running cloud-node-manager cloud-node-manager
PS C:\Users\capi> Get-Item -path HKLM:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kubelet
Get-Item -path HKLM:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kubelet
Hive: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services
Name Property
---- --------
kubelet Type : 16
Start : 2
ErrorControl : 1
ImagePath : c:\k\kubelet.exe --config=c:\k\kubelet.conf
--bootstrap-kubeconfig=c:\k\bootstrap-kubeconfig
--kubeconfig=c:\k\kubeconfig --cert-dir=c:\var\lib\kubelet\pki\
--windows-service
--logtostderr=false --log-file=C:\var\log\kubelet\kubelet.log
--register-with-taints=os=Windows:NoSchedule
--node-labels=node.openshift.io/os_id=Windows
--container-runtime=remote
--container-runtime-endpoint=npipe://./pipe/containerd-containerd
--resolv-conf= --cloud-provider=external --v=3
DependOnService : {containerd}
ObjectName : LocalSystem
Description : OpenShift managed kubelet
FailureActions :
PS C:\Users\capi>
Must-gather:
azure(install a fresh cluster, add windows worker, then enable ccm) - https://drive.google.com/file/d/1N2InQFe_mDIqayfUCqMyP-U8OE-2wss2/view?usp=sharing
azure(install a fresh cluster with ccm, then add windows worker) - https://drive.google.com/file/d/1iHR9LzQuCmwtRxsVz6oBMGCCJrJFwTYC/view?usp=sharing