Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: 4.13
Component/s: Networking / SR-IOV
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

configure the shutdownGracePeriod by kubeletconfig, then trigger the sriov daemon to drain the node. Before the sriov daemon finish draining node, delete it from the other terminal. The sriov daemon didn't deal with the SIGTERM from log as expected.

And the kubelet log show the sriov daemon is deleted with the pod's default gracePeriod 30s, not wait the shutdownGracepPeriod.

Version-Release number of selected component (if applicable):

% oc get clusterversionNAME   VERSION                           AVAILABLE   PROGRESSING   SINCE   STATUSversion   4.13.0-0.nightly-2023-03-28-014156   True     False      98m  Cluster version is 4.13.0-0.nightly-2023-03-28-014156

How reproducible:

always

Steps to Reproduce:

1.create a kubeletconfig to set shutdown-graceperiod, and check:
% oc get kubeletconfig set-shutdown-graceperiod -o yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  annotations:
    machineconfiguration.openshift.io/mc-name-suffix: ""
  creationTimestamp: "2023-03-29T03:08:37Z"
  finalizers:
  - 99-sriov-generated-kubelet
  generation: 1
  name: set-shutdown-graceperiod
  resourceVersion: "116502"
  uid: 0f542300-5b12-4d41-af1a-1db0693a3555
spec:
  kubeletConfig:
    shutdownGracePeriod: 10m
    shutdownGracePeriodCriticalPods: 180s
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/sriov: ""
status:
  conditions:
  - lastTransitionTime: "2023-03-29T03:08:37Z"
    message: Success
    status: "True"
    type: Success

2.wait the mcp roll out succeed and check the setting take effect on node: 
sh-4.4# chroot /host
sh-5.1# cat /etc/kubernetes/kubelet.conf
{
  "kind": "KubeletConfiguration",
  "apiVersion": "kubelet.config.k8s.io/v1beta1",
  "staticPodPath": "/etc/kubernetes/manifests",
  "syncFrequency": "0s",
  "fileCheckFrequency": "0s",
  "httpCheckFrequency": "0s",
  "tlsCipherSuites": [...
 "shutdownGracePeriod": "10m0s",
  "shutdownGracePeriodCriticalPods": "3m0s"
}

3.change the numVfs(from 22 to 23) of one interface to trigger the drain
% oc edit SriovNetworkNodePolicy sriov-nic-027-bak -n openshift-sriov-network-operator
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  creationTimestamp: "2023-03-29T10:56:35Z"
  generation: 4
  name: sriov-nic-027-bak
  namespace: openshift-sriov-network-operator
  resourceVersion: "328180"
  uid: a7894319-4dec-4ff0-b00d-ad336627ad99
spec:
  deviceType: netdevice
  isRdma: false
  linkType: eth
  nicSelector:
    pfNames:
    - ens5f0
  nodeSelector:
    sriovNo: "027"
  numVfs: 23
  priority: 99
  resourceName: sriov_nic_027_bak

4.monitor the sriov daemon log, and delete the pod when it started draining but not finished draining. 

% oc logs -f sriov-network-config-daemon-fsndz -n openshift-sriov-network-operator -c sriov-network-config-daemon
...
I0329 12:25:07.797295   69883 generic_plugin.go:197] generic-plugin needDrainNode(): no need drain, expect NumVfs 12, current NumVfs 12
I0329 12:25:07.797301   69883 utils.go:249] NeedUpdate(): NumVfs needs update desired=23, current=22
I0329 12:25:07.797305   69883 generic_plugin.go:193] generic-plugin needDrainNode(): need drain, PF 0000:86:00.0 request update
I0329 12:25:07.797328   69883 daemon.go:478] nodeStateSyncHandler(): plugin generic_plugin: reqDrain true, reqReboot false
I0329 12:25:07.797337   69883 daemon.go:482] nodeStateSyncHandler(): reqDrain true, reqReboot false disableDrain false
I0329 12:25:07.797343   69883 utils.go:778] RunCommand(): cat [/host/sys/kernel/security/lockdown]
I0329 12:25:07.798826   69883 utils.go:786] RunCommand(): out:([none] integrity confidentiality
), err:(<nil>)
I0329 12:25:07.798835   69883 utils.go:769] IsKernelLockdownMode(): [none] integrity confidentiality
, <nil>
I0329 12:25:07.798841   69883 mellanox_plugin.go:181] mellanox-plugin Apply()
I0329 12:25:07.798846   69883 mellanox_plugin.go:186] mellanox-plugin configFW()
I0329 12:25:07.798851   69883 intel_plugin.go:44] intel-plugin Apply()
I0329 12:25:07.816415   69883 daemon.go:504] nodeStateSyncHandler(): get drain lock for sriov daemon
I0329 12:25:07.816449   69883 leaderelection.go:248] attempting to acquire leader lease openshift-sriov-network-operator/config-daemon-draining-lock...
I0329 12:25:07.823173   69883 leaderelection.go:258] successfully acquired lease openshift-sriov-network-operator/config-daemon-draining-lock
I0329 12:25:07.823218   69883 daemon.go:773] getDrainLock(): started leading
I0329 12:25:10.823988   69883 daemon.go:782] getDrainLock(): no other node is draining
I0329 12:25:10.824026   69883 daemon.go:690] annotateNode(): Annotate node openshift-qe-027.lab.eng.rdu2.redhat.com with: Draining
I0329 12:25:10.837267   69883 daemon.go:511] nodeStateSyncHandler(): pause MCP
I0329 12:25:10.837280   69883 daemon.go:802] pauseMCP(): pausing MCP
I0329 12:25:10.848392   69883 daemon.go:828] pauseMCP(): MCP sriov is ready
I0329 12:25:10.848409   69883 daemon.go:838] pauseMCP(): pause MCP sriov
I0329 12:25:10.860448   69883 daemon.go:690] annotateNode(): Annotate node openshift-qe-027.lab.eng.rdu2.redhat.com with: Draining_MCP_Paused
I0329 12:25:10.889359   69883 daemon.go:828] pauseMCP(): MCP sriov is ready
I0329 12:25:10.889375   69883 daemon.go:830] pauseMCP(): stop MCP informer
I0329 12:25:10.889396   69883 daemon.go:518] nodeStateSyncHandler(): drain node
I0329 12:25:10.889401   69883 daemon.go:895] drainNode(): Update prepared
I0329 12:25:10.889405   69883 daemon.go:905] drainNode(): Start draining
E0329 12:25:11.768914   69883 daemon.go:128] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-jmr6f, openshift-dns/dns-default-pvqmg, openshift-dns/node-resolver-42kf2, openshift-image-registry/node-ca-7qjdr, openshift-ingress-canary/ingress-canary-szgj8, openshift-machine-config-operator/machine-config-daemon-vlngr, openshift-monitoring/node-exporter-4dfh4, openshift-multus/multus-additional-cni-plugins-fj9dq, openshift-multus/multus-dqxhf, openshift-multus/network-metrics-daemon-2t4mt, openshift-network-diagnostics/network-check-target-4dbsk, openshift-ovn-kubernetes/ovnkube-node-cvh8d, openshift-sriov-network-operator/sriov-device-plugin-mfzdp, openshift-sriov-network-operator/sriov-network-config-daemon-fsndz; deleting Pods that declare no controller: openshift-marketplace/qe-app-registry-s8zd7
I0329 12:25:11.770765   69883 daemon.go:128] evicting pod default/hello-openshift-backup-847sb
I0329 12:25:11.770781   69883 daemon.go:128] evicting pod default/hello-openshift-backup-4tkwc
I0329 12:25:11.770789   69883 daemon.go:128] evicting pod default/hello-openshift-backup-jnr9k
I0329 12:25:11.770799   69883 daemon.go:128] evicting pod default/hello-openshift-backup-ls74h
I0329 12:25:11.770824   69883 daemon.go:128] evicting pod default/hello-openshift-backup-jc2ps
I0329 12:25:11.770830   69883 daemon.go:128] evicting pod default/hello-openshift-backup-rhxpl
I0329 12:25:11.770790   69883 daemon.go:128] evicting pod default/hello-openshift-backup-lsp5h
I0329 12:25:11.770840   69883 daemon.go:128] evicting pod default/hello-openshift-backup-khvhn
I0329 12:25:11.770852   69883 daemon.go:128] evicting pod default/hello-openshift-backup-xrsmt
I0329 12:25:11.770862   69883 daemon.go:128] evicting pod default/hello-openshift-backup-zqvgh
I0329 12:25:11.770876   69883 daemon.go:128] evicting pod default/hello-openshift-w9tmg
I0329 12:25:11.770882   69883 daemon.go:128] evicting pod default/hello-openshift-backup-tz4df
I0329 12:25:11.770904   69883 daemon.go:128] evicting pod default/hello-openshift-backup-q2stq
I0329 12:25:11.770835   69883 daemon.go:128] evicting pod openshift-marketplace/qe-app-registry-s8zd7
I0329 12:25:11.770770   69883 daemon.go:128] evicting pod default/hello-openshift-backup-2nl9x
I0329 12:25:11.770857   69883 daemon.go:128] evicting pod default/hello-openshift-vcfhm
I0329 12:25:12.771634   69883 request.go:682] Waited for 1.000462605s due to client-side throttling, not priority and fairness, request: POST:https://api-int.sriov.openshift-qe.sdn.com:6443/api/v1/namespaces/default/pods/hello-openshift-backup-ls74h/eviction
I0329 12:25:13.427770   69883 daemon.go:166] Evicted pod from Node default/hello-openshift-backup-khvhn // the log abort here because we delete the pod from the other terminal
lyman@lymans-MacBook-Pro env %
lyman@lymans-MacBook-Pro env %

5.check the kubelet log, it show the sriov daemon pod is deleted with  default gracePeriod=30s, not wait the shutdownGracePeriod.  And from the deletion time, we make sure the SIGTERM is sent before the sriov daemon finish draining.

sh-4.4# chroot /host  
sh-5.1# journalctl -u kubelet --since="10 minutes ago" | less
...
Mar 29 12:25:13 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:13.498041    5713 kuberuntime_container.go:709] "Killing container with a grace period" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" podUID=af07f950-a282-4fee-adfd-9b86e5a67a3d containerName="sriov-cni" containerID="cri-o://a84448c56490d0b42d5fc6de8dfa4827bdd572071ac5a5dd6805ac208745d3e7" gracePeriod=30
Mar 29 12:25:13 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:13.498138    5713 kuberuntime_container.go:709] "Killing container with a grace period" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" podUID=af07f950-a282-4fee-adfd-9b86e5a67a3d containerName="sriov-infiniband-cni" containerID="cri-o://76e3886e253a2dc68a82924021b6347cf98571d54b9249125d082c7d1c6b6764" gracePeriod=30
...
Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.069539    5713 kubelet.go:2251] "SyncLoop (PLEG): event for pod" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" event=&{ID:af07f950-a282-4fee-adfd-9b86e5a67a3d Type:ContainerDied Data:76e3886e253a2dc68a82924021b6347cf98571db9249125d082c7d1c6b6764}
Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.069553    5713 kubelet.go:2251] "SyncLoop (PLEG): event for pod" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" event=&{ID:af07f950-a282-4fee-adfd-9b86e5a67a3d Type:ContainerDied Data:a84448c56490d0b42d5fc6de8dfa4827bdd572071ac5a5dd6805ac208745d3e7}...
Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.111874    5713 kubelet.go:2235] "SyncLoop DELETE" source="api" pods="[openshift-sriov-network-operator/sriov-network-config-daemon-fsndz]"
Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.116914    5713 kubelet.go:2229] "SyncLoop REMOVE" source="api" pods="[openshift-sriov-network-operator/sriov-network-config-daemon-fsndz]"

Actual results:

4. the sriov daemon didn't deal with the SIGTERM before it finished draining node

Expected results:

4. the sriov daemon should print log like below when it receive SIGTERM:
 
...I1001 22:19:34.032186 3462439 daemon.go:834] Got SIGTERM, but actively updating...

Additional info:

% oc get sriovnetworknodestates.sriovnetwork.openshift.io openshift-qe-027.lab.eng.rdu2.redhat.com  -n openshift-sriov-network-operator   -o yaml 
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  creationTimestamp: "2023-03-29T01:38:09Z"
  generation: 6
  name: openshift-qe-027.lab.eng.rdu2.redhat.com
  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
    uid: e7104876-a5ab-4ab1-b1ab-5a6c3783e0b1
  resourceVersion: "237784"
  uid: 8ce1dc37-9fa6-40fa-83ad-d5bf71dd44b1
spec:
  dpConfigVersion: "168304"
  interfaces:
  - linkType: eth
    name: ens1f1np1
    numVfs: 11
    pciAddress: 0000:3b:00.1
    vfGroups:
    - deviceType: netdevice
      policyName: sriov-nic-027
      resourceName: sriov_nic_027
      vfRange: 0-10
status:
  interfaces:
  - deviceID: "1017"
    driver: mlx5_core
    eSwitchMode: legacy
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: 04:3f:72:e4:15:aa
    mtu: 1500
    name: ens1f0np0
    pciAddress: 0000:3b:00.0
    totalvfs: 11
    vendor: 15b3
  - Vfs:
    - deviceID: "1018"
      driver: mlx5_core
      mac: 06:92:59:21:51:20
      mtu: 1500
      name: ens1f1v0
      pciAddress: 0000:3b:01.5
      vendor: 15b3
      vfID: 0
    - deviceID: "1018"
      driver: mlx5_core
      mac: 02:32:24:c3:88:3f
      mtu: 1500
      name: ens1f1v1
      pciAddress: 0000:3b:01.6
      vendor: 15b3
      vfID: 1

...
    - deviceID: "1018"
      driver: mlx5_core
      mac: de:6c:a3:8a:7f:9c
      mtu: 1500
      name: ens1f1v9
      pciAddress: 0000:3b:02.6
      vendor: 15b3
      vfID: 9
    deviceID: "1017"
    driver: mlx5_core
    eSwitchMode: legacy
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: 04:3f:72:e4:15:ab
    mtu: 1500
    name: ens1f1np1
    numVfs: 11
    pciAddress: 0000:3b:00.1
    totalvfs: 11
    vendor: 15b3
  - deviceID: 159b
    driver: ice
    eSwitchMode: legacy
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: b4:96:91:a5:c7:f4
    mtu: 1500
    name: ens2f0
    pciAddress: 0000:5e:00.0
    totalvfs: 128
    vendor: "8086"
  - deviceID: 159b
    driver: ice
    eSwitchMode: legacy
    linkSpeed: 25000 Mb/s
    linkType: ETH
    mac: b4:96:91:a5:c7:f5
    mtu: 1500
    name: ens2f1
    pciAddress: 0000:5e:00.1
    totalvfs: 128
    vendor: "8086"
  - deviceID: "1593"
    driver: ice
    eSwitchMode: legacy
    linkSpeed: -1 Mb/s
    linkType: ETH
    mac: b4:96:91:dc:74:68
    mtu: 1500
    name: ens5f0
    pciAddress: 0000:86:00.0
    totalvfs: 64
    vendor: "8086"
  - deviceID: "1593"
    driver: ice
    eSwitchMode: legacy
    linkSpeed: -1 Mb/s
    linkType: ETH
    mac: b4:96:91:dc:74:69
    mtu: 1500
    name: ens5f1
    pciAddress: 0000:86:00.1
    totalvfs: 64
    vendor: "8086"
...

relates to

OCPEDGE-19 Use graceful node shutdown to facilitate protecting uninterruptible workloads

Closed

Assignee:: Geri Peterson (Inactive)

Reporter:: Min Li

Need Info From:: None

Contributors:: None

QA Contact:: Sunil Choudhary

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/03/30 10:26 AM

Updated:: 2025/07/27 11:34 AM

Resolved:: 2024/01/04 4:37 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide