-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
4.13
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
configure the shutdownGracePeriod by kubeletconfig, then trigger the sriov daemon to drain the node. Before the sriov daemon finish draining node, delete it from the other terminal. The sriov daemon didn't deal with the SIGTERM from log as expected. And the kubelet log show the sriov daemon is deleted with the pod's default gracePeriod 30s, not wait the shutdownGracepPeriod.
Version-Release number of selected component (if applicable):
% oc get clusterversionNAME VERSION AVAILABLE PROGRESSING SINCE STATUSversion 4.13.0-0.nightly-2023-03-28-014156 True False 98m Cluster version is 4.13.0-0.nightly-2023-03-28-014156
How reproducible:
always
Steps to Reproduce:
1.create a kubeletconfig to set shutdown-graceperiod, and check:
% oc get kubeletconfig set-shutdown-graceperiod -o yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
annotations:
machineconfiguration.openshift.io/mc-name-suffix: ""
creationTimestamp: "2023-03-29T03:08:37Z"
finalizers:
- 99-sriov-generated-kubelet
generation: 1
name: set-shutdown-graceperiod
resourceVersion: "116502"
uid: 0f542300-5b12-4d41-af1a-1db0693a3555
spec:
kubeletConfig:
shutdownGracePeriod: 10m
shutdownGracePeriodCriticalPods: 180s
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/sriov: ""
status:
conditions:
- lastTransitionTime: "2023-03-29T03:08:37Z"
message: Success
status: "True"
type: Success
2.wait the mcp roll out succeed and check the setting take effect on node:
sh-4.4# chroot /host
sh-5.1# cat /etc/kubernetes/kubelet.conf
{
"kind": "KubeletConfiguration",
"apiVersion": "kubelet.config.k8s.io/v1beta1",
"staticPodPath": "/etc/kubernetes/manifests",
"syncFrequency": "0s",
"fileCheckFrequency": "0s",
"httpCheckFrequency": "0s",
"tlsCipherSuites": [...
"shutdownGracePeriod": "10m0s",
"shutdownGracePeriodCriticalPods": "3m0s"
}
3.change the numVfs(from 22 to 23) of one interface to trigger the drain
% oc edit SriovNetworkNodePolicy sriov-nic-027-bak -n openshift-sriov-network-operator
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
creationTimestamp: "2023-03-29T10:56:35Z"
generation: 4
name: sriov-nic-027-bak
namespace: openshift-sriov-network-operator
resourceVersion: "328180"
uid: a7894319-4dec-4ff0-b00d-ad336627ad99
spec:
deviceType: netdevice
isRdma: false
linkType: eth
nicSelector:
pfNames:
- ens5f0
nodeSelector:
sriovNo: "027"
numVfs: 23
priority: 99
resourceName: sriov_nic_027_bak
4.monitor the sriov daemon log, and delete the pod when it started draining but not finished draining.
% oc logs -f sriov-network-config-daemon-fsndz -n openshift-sriov-network-operator -c sriov-network-config-daemon
...
I0329 12:25:07.797295 69883 generic_plugin.go:197] generic-plugin needDrainNode(): no need drain, expect NumVfs 12, current NumVfs 12
I0329 12:25:07.797301 69883 utils.go:249] NeedUpdate(): NumVfs needs update desired=23, current=22
I0329 12:25:07.797305 69883 generic_plugin.go:193] generic-plugin needDrainNode(): need drain, PF 0000:86:00.0 request update
I0329 12:25:07.797328 69883 daemon.go:478] nodeStateSyncHandler(): plugin generic_plugin: reqDrain true, reqReboot false
I0329 12:25:07.797337 69883 daemon.go:482] nodeStateSyncHandler(): reqDrain true, reqReboot false disableDrain false
I0329 12:25:07.797343 69883 utils.go:778] RunCommand(): cat [/host/sys/kernel/security/lockdown]
I0329 12:25:07.798826 69883 utils.go:786] RunCommand(): out:([none] integrity confidentiality
), err:(<nil>)
I0329 12:25:07.798835 69883 utils.go:769] IsKernelLockdownMode(): [none] integrity confidentiality
, <nil>
I0329 12:25:07.798841 69883 mellanox_plugin.go:181] mellanox-plugin Apply()
I0329 12:25:07.798846 69883 mellanox_plugin.go:186] mellanox-plugin configFW()
I0329 12:25:07.798851 69883 intel_plugin.go:44] intel-plugin Apply()
I0329 12:25:07.816415 69883 daemon.go:504] nodeStateSyncHandler(): get drain lock for sriov daemon
I0329 12:25:07.816449 69883 leaderelection.go:248] attempting to acquire leader lease openshift-sriov-network-operator/config-daemon-draining-lock...
I0329 12:25:07.823173 69883 leaderelection.go:258] successfully acquired lease openshift-sriov-network-operator/config-daemon-draining-lock
I0329 12:25:07.823218 69883 daemon.go:773] getDrainLock(): started leading
I0329 12:25:10.823988 69883 daemon.go:782] getDrainLock(): no other node is draining
I0329 12:25:10.824026 69883 daemon.go:690] annotateNode(): Annotate node openshift-qe-027.lab.eng.rdu2.redhat.com with: Draining
I0329 12:25:10.837267 69883 daemon.go:511] nodeStateSyncHandler(): pause MCP
I0329 12:25:10.837280 69883 daemon.go:802] pauseMCP(): pausing MCP
I0329 12:25:10.848392 69883 daemon.go:828] pauseMCP(): MCP sriov is ready
I0329 12:25:10.848409 69883 daemon.go:838] pauseMCP(): pause MCP sriov
I0329 12:25:10.860448 69883 daemon.go:690] annotateNode(): Annotate node openshift-qe-027.lab.eng.rdu2.redhat.com with: Draining_MCP_Paused
I0329 12:25:10.889359 69883 daemon.go:828] pauseMCP(): MCP sriov is ready
I0329 12:25:10.889375 69883 daemon.go:830] pauseMCP(): stop MCP informer
I0329 12:25:10.889396 69883 daemon.go:518] nodeStateSyncHandler(): drain node
I0329 12:25:10.889401 69883 daemon.go:895] drainNode(): Update prepared
I0329 12:25:10.889405 69883 daemon.go:905] drainNode(): Start draining
E0329 12:25:11.768914 69883 daemon.go:128] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-jmr6f, openshift-dns/dns-default-pvqmg, openshift-dns/node-resolver-42kf2, openshift-image-registry/node-ca-7qjdr, openshift-ingress-canary/ingress-canary-szgj8, openshift-machine-config-operator/machine-config-daemon-vlngr, openshift-monitoring/node-exporter-4dfh4, openshift-multus/multus-additional-cni-plugins-fj9dq, openshift-multus/multus-dqxhf, openshift-multus/network-metrics-daemon-2t4mt, openshift-network-diagnostics/network-check-target-4dbsk, openshift-ovn-kubernetes/ovnkube-node-cvh8d, openshift-sriov-network-operator/sriov-device-plugin-mfzdp, openshift-sriov-network-operator/sriov-network-config-daemon-fsndz; deleting Pods that declare no controller: openshift-marketplace/qe-app-registry-s8zd7
I0329 12:25:11.770765 69883 daemon.go:128] evicting pod default/hello-openshift-backup-847sb
I0329 12:25:11.770781 69883 daemon.go:128] evicting pod default/hello-openshift-backup-4tkwc
I0329 12:25:11.770789 69883 daemon.go:128] evicting pod default/hello-openshift-backup-jnr9k
I0329 12:25:11.770799 69883 daemon.go:128] evicting pod default/hello-openshift-backup-ls74h
I0329 12:25:11.770824 69883 daemon.go:128] evicting pod default/hello-openshift-backup-jc2ps
I0329 12:25:11.770830 69883 daemon.go:128] evicting pod default/hello-openshift-backup-rhxpl
I0329 12:25:11.770790 69883 daemon.go:128] evicting pod default/hello-openshift-backup-lsp5h
I0329 12:25:11.770840 69883 daemon.go:128] evicting pod default/hello-openshift-backup-khvhn
I0329 12:25:11.770852 69883 daemon.go:128] evicting pod default/hello-openshift-backup-xrsmt
I0329 12:25:11.770862 69883 daemon.go:128] evicting pod default/hello-openshift-backup-zqvgh
I0329 12:25:11.770876 69883 daemon.go:128] evicting pod default/hello-openshift-w9tmg
I0329 12:25:11.770882 69883 daemon.go:128] evicting pod default/hello-openshift-backup-tz4df
I0329 12:25:11.770904 69883 daemon.go:128] evicting pod default/hello-openshift-backup-q2stq
I0329 12:25:11.770835 69883 daemon.go:128] evicting pod openshift-marketplace/qe-app-registry-s8zd7
I0329 12:25:11.770770 69883 daemon.go:128] evicting pod default/hello-openshift-backup-2nl9x
I0329 12:25:11.770857 69883 daemon.go:128] evicting pod default/hello-openshift-vcfhm
I0329 12:25:12.771634 69883 request.go:682] Waited for 1.000462605s due to client-side throttling, not priority and fairness, request: POST:https://api-int.sriov.openshift-qe.sdn.com:6443/api/v1/namespaces/default/pods/hello-openshift-backup-ls74h/eviction
I0329 12:25:13.427770 69883 daemon.go:166] Evicted pod from Node default/hello-openshift-backup-khvhn // the log abort here because we delete the pod from the other terminal
lyman@lymans-MacBook-Pro env %
lyman@lymans-MacBook-Pro env %
5.check the kubelet log, it show the sriov daemon pod is deleted with default gracePeriod=30s, not wait the shutdownGracePeriod. And from the deletion time, we make sure the SIGTERM is sent before the sriov daemon finish draining.
sh-4.4# chroot /host
sh-5.1# journalctl -u kubelet --since="10 minutes ago" | less
...
Mar 29 12:25:13 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:13.498041 5713 kuberuntime_container.go:709] "Killing container with a grace period" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" podUID=af07f950-a282-4fee-adfd-9b86e5a67a3d containerName="sriov-cni" containerID="cri-o://a84448c56490d0b42d5fc6de8dfa4827bdd572071ac5a5dd6805ac208745d3e7" gracePeriod=30
Mar 29 12:25:13 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:13.498138 5713 kuberuntime_container.go:709] "Killing container with a grace period" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" podUID=af07f950-a282-4fee-adfd-9b86e5a67a3d containerName="sriov-infiniband-cni" containerID="cri-o://76e3886e253a2dc68a82924021b6347cf98571d54b9249125d082c7d1c6b6764" gracePeriod=30
...
Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.069539 5713 kubelet.go:2251] "SyncLoop (PLEG): event for pod" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" event=&{ID:af07f950-a282-4fee-adfd-9b86e5a67a3d Type:ContainerDied Data:76e3886e253a2dc68a82924021b6347cf98571db9249125d082c7d1c6b6764}
Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.069553 5713 kubelet.go:2251] "SyncLoop (PLEG): event for pod" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" event=&{ID:af07f950-a282-4fee-adfd-9b86e5a67a3d Type:ContainerDied Data:a84448c56490d0b42d5fc6de8dfa4827bdd572071ac5a5dd6805ac208745d3e7}...
Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.111874 5713 kubelet.go:2235] "SyncLoop DELETE" source="api" pods="[openshift-sriov-network-operator/sriov-network-config-daemon-fsndz]"
Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.116914 5713 kubelet.go:2229] "SyncLoop REMOVE" source="api" pods="[openshift-sriov-network-operator/sriov-network-config-daemon-fsndz]"
Actual results:
4. the sriov daemon didn't deal with the SIGTERM before it finished draining node
Expected results:
4. the sriov daemon should print log like below when it receive SIGTERM: ...I1001 22:19:34.032186 3462439 daemon.go:834] Got SIGTERM, but actively updating...
Additional info:
% oc get sriovnetworknodestates.sriovnetwork.openshift.io openshift-qe-027.lab.eng.rdu2.redhat.com -n openshift-sriov-network-operator -o yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodeState metadata: creationTimestamp: "2023-03-29T01:38:09Z" generation: 6 name: openshift-qe-027.lab.eng.rdu2.redhat.com namespace: openshift-sriov-network-operator ownerReferences: - apiVersion: sriovnetwork.openshift.io/v1 blockOwnerDeletion: true controller: true kind: SriovNetworkNodePolicy name: default uid: e7104876-a5ab-4ab1-b1ab-5a6c3783e0b1 resourceVersion: "237784" uid: 8ce1dc37-9fa6-40fa-83ad-d5bf71dd44b1 spec: dpConfigVersion: "168304" interfaces: - linkType: eth name: ens1f1np1 numVfs: 11 pciAddress: 0000:3b:00.1 vfGroups: - deviceType: netdevice policyName: sriov-nic-027 resourceName: sriov_nic_027 vfRange: 0-10 status: interfaces: - deviceID: "1017" driver: mlx5_core eSwitchMode: legacy linkSpeed: 25000 Mb/s linkType: ETH mac: 04:3f:72:e4:15:aa mtu: 1500 name: ens1f0np0 pciAddress: 0000:3b:00.0 totalvfs: 11 vendor: 15b3 - Vfs: - deviceID: "1018" driver: mlx5_core mac: 06:92:59:21:51:20 mtu: 1500 name: ens1f1v0 pciAddress: 0000:3b:01.5 vendor: 15b3 vfID: 0 - deviceID: "1018" driver: mlx5_core mac: 02:32:24:c3:88:3f mtu: 1500 name: ens1f1v1 pciAddress: 0000:3b:01.6 vendor: 15b3 vfID: 1 ... - deviceID: "1018" driver: mlx5_core mac: de:6c:a3:8a:7f:9c mtu: 1500 name: ens1f1v9 pciAddress: 0000:3b:02.6 vendor: 15b3 vfID: 9 deviceID: "1017" driver: mlx5_core eSwitchMode: legacy linkSpeed: 25000 Mb/s linkType: ETH mac: 04:3f:72:e4:15:ab mtu: 1500 name: ens1f1np1 numVfs: 11 pciAddress: 0000:3b:00.1 totalvfs: 11 vendor: 15b3 - deviceID: 159b driver: ice eSwitchMode: legacy linkSpeed: 25000 Mb/s linkType: ETH mac: b4:96:91:a5:c7:f4 mtu: 1500 name: ens2f0 pciAddress: 0000:5e:00.0 totalvfs: 128 vendor: "8086" - deviceID: 159b driver: ice eSwitchMode: legacy linkSpeed: 25000 Mb/s linkType: ETH mac: b4:96:91:a5:c7:f5 mtu: 1500 name: ens2f1 pciAddress: 0000:5e:00.1 totalvfs: 128 vendor: "8086" - deviceID: "1593" driver: ice eSwitchMode: legacy linkSpeed: -1 Mb/s linkType: ETH mac: b4:96:91:dc:74:68 mtu: 1500 name: ens5f0 pciAddress: 0000:86:00.0 totalvfs: 64 vendor: "8086" - deviceID: "1593" driver: ice eSwitchMode: legacy linkSpeed: -1 Mb/s linkType: ETH mac: b4:96:91:dc:74:69 mtu: 1500 name: ens5f1 pciAddress: 0000:86:00.1 totalvfs: 64 vendor: "8086" ...
- relates to
-
OCPEDGE-19 Use graceful node shutdown to facilitate protecting uninterruptible workloads
-
- Closed
-