-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
4.13
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
configure the shutdownGracePeriod by kubeletconfig, then trigger the sriov daemon to drain the node. Before the sriov daemon finish draining node, delete it from the other terminal. The sriov daemon didn't deal with the SIGTERM from log as expected. And the kubelet log show the sriov daemon is deleted with the pod's default gracePeriod 30s, not wait the shutdownGracepPeriod.
Version-Release number of selected component (if applicable):
% oc get clusterversionNAME VERSION AVAILABLE PROGRESSING SINCE STATUSversion 4.13.0-0.nightly-2023-03-28-014156 True False 98m Cluster version is 4.13.0-0.nightly-2023-03-28-014156
How reproducible:
always
Steps to Reproduce:
1.create a kubeletconfig to set shutdown-graceperiod, and check: % oc get kubeletconfig set-shutdown-graceperiod -o yaml apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: annotations: machineconfiguration.openshift.io/mc-name-suffix: "" creationTimestamp: "2023-03-29T03:08:37Z" finalizers: - 99-sriov-generated-kubelet generation: 1 name: set-shutdown-graceperiod resourceVersion: "116502" uid: 0f542300-5b12-4d41-af1a-1db0693a3555 spec: kubeletConfig: shutdownGracePeriod: 10m shutdownGracePeriodCriticalPods: 180s machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/sriov: "" status: conditions: - lastTransitionTime: "2023-03-29T03:08:37Z" message: Success status: "True" type: Success 2.wait the mcp roll out succeed and check the setting take effect on node: sh-4.4# chroot /host sh-5.1# cat /etc/kubernetes/kubelet.conf { "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1", "staticPodPath": "/etc/kubernetes/manifests", "syncFrequency": "0s", "fileCheckFrequency": "0s", "httpCheckFrequency": "0s", "tlsCipherSuites": [... "shutdownGracePeriod": "10m0s", "shutdownGracePeriodCriticalPods": "3m0s" } 3.change the numVfs(from 22 to 23) of one interface to trigger the drain % oc edit SriovNetworkNodePolicy sriov-nic-027-bak -n openshift-sriov-network-operator apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: creationTimestamp: "2023-03-29T10:56:35Z" generation: 4 name: sriov-nic-027-bak namespace: openshift-sriov-network-operator resourceVersion: "328180" uid: a7894319-4dec-4ff0-b00d-ad336627ad99 spec: deviceType: netdevice isRdma: false linkType: eth nicSelector: pfNames: - ens5f0 nodeSelector: sriovNo: "027" numVfs: 23 priority: 99 resourceName: sriov_nic_027_bak 4.monitor the sriov daemon log, and delete the pod when it started draining but not finished draining. % oc logs -f sriov-network-config-daemon-fsndz -n openshift-sriov-network-operator -c sriov-network-config-daemon ... I0329 12:25:07.797295 69883 generic_plugin.go:197] generic-plugin needDrainNode(): no need drain, expect NumVfs 12, current NumVfs 12 I0329 12:25:07.797301 69883 utils.go:249] NeedUpdate(): NumVfs needs update desired=23, current=22 I0329 12:25:07.797305 69883 generic_plugin.go:193] generic-plugin needDrainNode(): need drain, PF 0000:86:00.0 request update I0329 12:25:07.797328 69883 daemon.go:478] nodeStateSyncHandler(): plugin generic_plugin: reqDrain true, reqReboot false I0329 12:25:07.797337 69883 daemon.go:482] nodeStateSyncHandler(): reqDrain true, reqReboot false disableDrain false I0329 12:25:07.797343 69883 utils.go:778] RunCommand(): cat [/host/sys/kernel/security/lockdown] I0329 12:25:07.798826 69883 utils.go:786] RunCommand(): out:([none] integrity confidentiality ), err:(<nil>) I0329 12:25:07.798835 69883 utils.go:769] IsKernelLockdownMode(): [none] integrity confidentiality , <nil> I0329 12:25:07.798841 69883 mellanox_plugin.go:181] mellanox-plugin Apply() I0329 12:25:07.798846 69883 mellanox_plugin.go:186] mellanox-plugin configFW() I0329 12:25:07.798851 69883 intel_plugin.go:44] intel-plugin Apply() I0329 12:25:07.816415 69883 daemon.go:504] nodeStateSyncHandler(): get drain lock for sriov daemon I0329 12:25:07.816449 69883 leaderelection.go:248] attempting to acquire leader lease openshift-sriov-network-operator/config-daemon-draining-lock... I0329 12:25:07.823173 69883 leaderelection.go:258] successfully acquired lease openshift-sriov-network-operator/config-daemon-draining-lock I0329 12:25:07.823218 69883 daemon.go:773] getDrainLock(): started leading I0329 12:25:10.823988 69883 daemon.go:782] getDrainLock(): no other node is draining I0329 12:25:10.824026 69883 daemon.go:690] annotateNode(): Annotate node openshift-qe-027.lab.eng.rdu2.redhat.com with: Draining I0329 12:25:10.837267 69883 daemon.go:511] nodeStateSyncHandler(): pause MCP I0329 12:25:10.837280 69883 daemon.go:802] pauseMCP(): pausing MCP I0329 12:25:10.848392 69883 daemon.go:828] pauseMCP(): MCP sriov is ready I0329 12:25:10.848409 69883 daemon.go:838] pauseMCP(): pause MCP sriov I0329 12:25:10.860448 69883 daemon.go:690] annotateNode(): Annotate node openshift-qe-027.lab.eng.rdu2.redhat.com with: Draining_MCP_Paused I0329 12:25:10.889359 69883 daemon.go:828] pauseMCP(): MCP sriov is ready I0329 12:25:10.889375 69883 daemon.go:830] pauseMCP(): stop MCP informer I0329 12:25:10.889396 69883 daemon.go:518] nodeStateSyncHandler(): drain node I0329 12:25:10.889401 69883 daemon.go:895] drainNode(): Update prepared I0329 12:25:10.889405 69883 daemon.go:905] drainNode(): Start draining E0329 12:25:11.768914 69883 daemon.go:128] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-jmr6f, openshift-dns/dns-default-pvqmg, openshift-dns/node-resolver-42kf2, openshift-image-registry/node-ca-7qjdr, openshift-ingress-canary/ingress-canary-szgj8, openshift-machine-config-operator/machine-config-daemon-vlngr, openshift-monitoring/node-exporter-4dfh4, openshift-multus/multus-additional-cni-plugins-fj9dq, openshift-multus/multus-dqxhf, openshift-multus/network-metrics-daemon-2t4mt, openshift-network-diagnostics/network-check-target-4dbsk, openshift-ovn-kubernetes/ovnkube-node-cvh8d, openshift-sriov-network-operator/sriov-device-plugin-mfzdp, openshift-sriov-network-operator/sriov-network-config-daemon-fsndz; deleting Pods that declare no controller: openshift-marketplace/qe-app-registry-s8zd7 I0329 12:25:11.770765 69883 daemon.go:128] evicting pod default/hello-openshift-backup-847sb I0329 12:25:11.770781 69883 daemon.go:128] evicting pod default/hello-openshift-backup-4tkwc I0329 12:25:11.770789 69883 daemon.go:128] evicting pod default/hello-openshift-backup-jnr9k I0329 12:25:11.770799 69883 daemon.go:128] evicting pod default/hello-openshift-backup-ls74h I0329 12:25:11.770824 69883 daemon.go:128] evicting pod default/hello-openshift-backup-jc2ps I0329 12:25:11.770830 69883 daemon.go:128] evicting pod default/hello-openshift-backup-rhxpl I0329 12:25:11.770790 69883 daemon.go:128] evicting pod default/hello-openshift-backup-lsp5h I0329 12:25:11.770840 69883 daemon.go:128] evicting pod default/hello-openshift-backup-khvhn I0329 12:25:11.770852 69883 daemon.go:128] evicting pod default/hello-openshift-backup-xrsmt I0329 12:25:11.770862 69883 daemon.go:128] evicting pod default/hello-openshift-backup-zqvgh I0329 12:25:11.770876 69883 daemon.go:128] evicting pod default/hello-openshift-w9tmg I0329 12:25:11.770882 69883 daemon.go:128] evicting pod default/hello-openshift-backup-tz4df I0329 12:25:11.770904 69883 daemon.go:128] evicting pod default/hello-openshift-backup-q2stq I0329 12:25:11.770835 69883 daemon.go:128] evicting pod openshift-marketplace/qe-app-registry-s8zd7 I0329 12:25:11.770770 69883 daemon.go:128] evicting pod default/hello-openshift-backup-2nl9x I0329 12:25:11.770857 69883 daemon.go:128] evicting pod default/hello-openshift-vcfhm I0329 12:25:12.771634 69883 request.go:682] Waited for 1.000462605s due to client-side throttling, not priority and fairness, request: POST:https://api-int.sriov.openshift-qe.sdn.com:6443/api/v1/namespaces/default/pods/hello-openshift-backup-ls74h/eviction I0329 12:25:13.427770 69883 daemon.go:166] Evicted pod from Node default/hello-openshift-backup-khvhn // the log abort here because we delete the pod from the other terminal lyman@lymans-MacBook-Pro env % lyman@lymans-MacBook-Pro env % 5.check the kubelet log, it show the sriov daemon pod is deleted with default gracePeriod=30s, not wait the shutdownGracePeriod. And from the deletion time, we make sure the SIGTERM is sent before the sriov daemon finish draining. sh-4.4# chroot /host sh-5.1# journalctl -u kubelet --since="10 minutes ago" | less ... Mar 29 12:25:13 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:13.498041 5713 kuberuntime_container.go:709] "Killing container with a grace period" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" podUID=af07f950-a282-4fee-adfd-9b86e5a67a3d containerName="sriov-cni" containerID="cri-o://a84448c56490d0b42d5fc6de8dfa4827bdd572071ac5a5dd6805ac208745d3e7" gracePeriod=30 Mar 29 12:25:13 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:13.498138 5713 kuberuntime_container.go:709] "Killing container with a grace period" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" podUID=af07f950-a282-4fee-adfd-9b86e5a67a3d containerName="sriov-infiniband-cni" containerID="cri-o://76e3886e253a2dc68a82924021b6347cf98571d54b9249125d082c7d1c6b6764" gracePeriod=30 ... Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.069539 5713 kubelet.go:2251] "SyncLoop (PLEG): event for pod" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" event=&{ID:af07f950-a282-4fee-adfd-9b86e5a67a3d Type:ContainerDied Data:76e3886e253a2dc68a82924021b6347cf98571db9249125d082c7d1c6b6764} Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.069553 5713 kubelet.go:2251] "SyncLoop (PLEG): event for pod" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" event=&{ID:af07f950-a282-4fee-adfd-9b86e5a67a3d Type:ContainerDied Data:a84448c56490d0b42d5fc6de8dfa4827bdd572071ac5a5dd6805ac208745d3e7}... Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.111874 5713 kubelet.go:2235] "SyncLoop DELETE" source="api" pods="[openshift-sriov-network-operator/sriov-network-config-daemon-fsndz]" Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.116914 5713 kubelet.go:2229] "SyncLoop REMOVE" source="api" pods="[openshift-sriov-network-operator/sriov-network-config-daemon-fsndz]"
Actual results:
4. the sriov daemon didn't deal with the SIGTERM before it finished draining node
Expected results:
4. the sriov daemon should print log like below when it receive SIGTERM: ...I1001 22:19:34.032186 3462439 daemon.go:834] Got SIGTERM, but actively updating...
Additional info:
% oc get sriovnetworknodestates.sriovnetwork.openshift.io openshift-qe-027.lab.eng.rdu2.redhat.com -n openshift-sriov-network-operator -o yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodeState metadata: creationTimestamp: "2023-03-29T01:38:09Z" generation: 6 name: openshift-qe-027.lab.eng.rdu2.redhat.com namespace: openshift-sriov-network-operator ownerReferences: - apiVersion: sriovnetwork.openshift.io/v1 blockOwnerDeletion: true controller: true kind: SriovNetworkNodePolicy name: default uid: e7104876-a5ab-4ab1-b1ab-5a6c3783e0b1 resourceVersion: "237784" uid: 8ce1dc37-9fa6-40fa-83ad-d5bf71dd44b1 spec: dpConfigVersion: "168304" interfaces: - linkType: eth name: ens1f1np1 numVfs: 11 pciAddress: 0000:3b:00.1 vfGroups: - deviceType: netdevice policyName: sriov-nic-027 resourceName: sriov_nic_027 vfRange: 0-10 status: interfaces: - deviceID: "1017" driver: mlx5_core eSwitchMode: legacy linkSpeed: 25000 Mb/s linkType: ETH mac: 04:3f:72:e4:15:aa mtu: 1500 name: ens1f0np0 pciAddress: 0000:3b:00.0 totalvfs: 11 vendor: 15b3 - Vfs: - deviceID: "1018" driver: mlx5_core mac: 06:92:59:21:51:20 mtu: 1500 name: ens1f1v0 pciAddress: 0000:3b:01.5 vendor: 15b3 vfID: 0 - deviceID: "1018" driver: mlx5_core mac: 02:32:24:c3:88:3f mtu: 1500 name: ens1f1v1 pciAddress: 0000:3b:01.6 vendor: 15b3 vfID: 1 ... - deviceID: "1018" driver: mlx5_core mac: de:6c:a3:8a:7f:9c mtu: 1500 name: ens1f1v9 pciAddress: 0000:3b:02.6 vendor: 15b3 vfID: 9 deviceID: "1017" driver: mlx5_core eSwitchMode: legacy linkSpeed: 25000 Mb/s linkType: ETH mac: 04:3f:72:e4:15:ab mtu: 1500 name: ens1f1np1 numVfs: 11 pciAddress: 0000:3b:00.1 totalvfs: 11 vendor: 15b3 - deviceID: 159b driver: ice eSwitchMode: legacy linkSpeed: 25000 Mb/s linkType: ETH mac: b4:96:91:a5:c7:f4 mtu: 1500 name: ens2f0 pciAddress: 0000:5e:00.0 totalvfs: 128 vendor: "8086" - deviceID: 159b driver: ice eSwitchMode: legacy linkSpeed: 25000 Mb/s linkType: ETH mac: b4:96:91:a5:c7:f5 mtu: 1500 name: ens2f1 pciAddress: 0000:5e:00.1 totalvfs: 128 vendor: "8086" - deviceID: "1593" driver: ice eSwitchMode: legacy linkSpeed: -1 Mb/s linkType: ETH mac: b4:96:91:dc:74:68 mtu: 1500 name: ens5f0 pciAddress: 0000:86:00.0 totalvfs: 64 vendor: "8086" - deviceID: "1593" driver: ice eSwitchMode: legacy linkSpeed: -1 Mb/s linkType: ETH mac: b4:96:91:dc:74:69 mtu: 1500 name: ens5f1 pciAddress: 0000:86:00.1 totalvfs: 64 vendor: "8086" ...
- relates to
-
OCPEDGE-19 Use graceful node shutdown to facilitate protecting uninterruptible workloads
-
- Closed
-