Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11121

when configuring shutdownGracePeriod, the sriov daemon don't work as expected

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • 4.13
    • Networking / SR-IOV
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • No
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      configure the shutdownGracePeriod by kubeletconfig, then trigger the sriov daemon to drain the node. Before the sriov daemon finish draining node, delete it from the other terminal. The sriov daemon didn't deal with the SIGTERM from log as expected.
      
      And the kubelet log show the sriov daemon is deleted with the pod's default gracePeriod 30s, not wait the shutdownGracepPeriod. 

      Version-Release number of selected component (if applicable):

      % oc get clusterversionNAME   VERSION                           AVAILABLE   PROGRESSING   SINCE   STATUSversion   4.13.0-0.nightly-2023-03-28-014156   True     False      98m  Cluster version is 4.13.0-0.nightly-2023-03-28-014156

      How reproducible:

      always

      Steps to Reproduce:

      1.create a kubeletconfig to set shutdown-graceperiod, and check:
      % oc get kubeletconfig set-shutdown-graceperiod -o yaml
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        annotations:
          machineconfiguration.openshift.io/mc-name-suffix: ""
        creationTimestamp: "2023-03-29T03:08:37Z"
        finalizers:
        - 99-sriov-generated-kubelet
        generation: 1
        name: set-shutdown-graceperiod
        resourceVersion: "116502"
        uid: 0f542300-5b12-4d41-af1a-1db0693a3555
      spec:
        kubeletConfig:
          shutdownGracePeriod: 10m
          shutdownGracePeriodCriticalPods: 180s
        machineConfigPoolSelector:
          matchLabels:
            pools.operator.machineconfiguration.openshift.io/sriov: ""
      status:
        conditions:
        - lastTransitionTime: "2023-03-29T03:08:37Z"
          message: Success
          status: "True"
          type: Success
      
      2.wait the mcp roll out succeed and check the setting take effect on node: 
      sh-4.4# chroot /host
      sh-5.1# cat /etc/kubernetes/kubelet.conf
      {
        "kind": "KubeletConfiguration",
        "apiVersion": "kubelet.config.k8s.io/v1beta1",
        "staticPodPath": "/etc/kubernetes/manifests",
        "syncFrequency": "0s",
        "fileCheckFrequency": "0s",
        "httpCheckFrequency": "0s",
        "tlsCipherSuites": [...
       "shutdownGracePeriod": "10m0s",
        "shutdownGracePeriodCriticalPods": "3m0s"
      }
      
      3.change the numVfs(from 22 to 23) of one interface to trigger the drain
      % oc edit SriovNetworkNodePolicy sriov-nic-027-bak -n openshift-sriov-network-operator
      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetworkNodePolicy
      metadata:
        creationTimestamp: "2023-03-29T10:56:35Z"
        generation: 4
        name: sriov-nic-027-bak
        namespace: openshift-sriov-network-operator
        resourceVersion: "328180"
        uid: a7894319-4dec-4ff0-b00d-ad336627ad99
      spec:
        deviceType: netdevice
        isRdma: false
        linkType: eth
        nicSelector:
          pfNames:
          - ens5f0
        nodeSelector:
          sriovNo: "027"
        numVfs: 23
        priority: 99
        resourceName: sriov_nic_027_bak
      
      4.monitor the sriov daemon log, and delete the pod when it started draining but not finished draining. 
      
      % oc logs -f sriov-network-config-daemon-fsndz -n openshift-sriov-network-operator -c sriov-network-config-daemon
      ...
      I0329 12:25:07.797295   69883 generic_plugin.go:197] generic-plugin needDrainNode(): no need drain, expect NumVfs 12, current NumVfs 12
      I0329 12:25:07.797301   69883 utils.go:249] NeedUpdate(): NumVfs needs update desired=23, current=22
      I0329 12:25:07.797305   69883 generic_plugin.go:193] generic-plugin needDrainNode(): need drain, PF 0000:86:00.0 request update
      I0329 12:25:07.797328   69883 daemon.go:478] nodeStateSyncHandler(): plugin generic_plugin: reqDrain true, reqReboot false
      I0329 12:25:07.797337   69883 daemon.go:482] nodeStateSyncHandler(): reqDrain true, reqReboot false disableDrain false
      I0329 12:25:07.797343   69883 utils.go:778] RunCommand(): cat [/host/sys/kernel/security/lockdown]
      I0329 12:25:07.798826   69883 utils.go:786] RunCommand(): out:([none] integrity confidentiality
      ), err:(<nil>)
      I0329 12:25:07.798835   69883 utils.go:769] IsKernelLockdownMode(): [none] integrity confidentiality
      , <nil>
      I0329 12:25:07.798841   69883 mellanox_plugin.go:181] mellanox-plugin Apply()
      I0329 12:25:07.798846   69883 mellanox_plugin.go:186] mellanox-plugin configFW()
      I0329 12:25:07.798851   69883 intel_plugin.go:44] intel-plugin Apply()
      I0329 12:25:07.816415   69883 daemon.go:504] nodeStateSyncHandler(): get drain lock for sriov daemon
      I0329 12:25:07.816449   69883 leaderelection.go:248] attempting to acquire leader lease openshift-sriov-network-operator/config-daemon-draining-lock...
      I0329 12:25:07.823173   69883 leaderelection.go:258] successfully acquired lease openshift-sriov-network-operator/config-daemon-draining-lock
      I0329 12:25:07.823218   69883 daemon.go:773] getDrainLock(): started leading
      I0329 12:25:10.823988   69883 daemon.go:782] getDrainLock(): no other node is draining
      I0329 12:25:10.824026   69883 daemon.go:690] annotateNode(): Annotate node openshift-qe-027.lab.eng.rdu2.redhat.com with: Draining
      I0329 12:25:10.837267   69883 daemon.go:511] nodeStateSyncHandler(): pause MCP
      I0329 12:25:10.837280   69883 daemon.go:802] pauseMCP(): pausing MCP
      I0329 12:25:10.848392   69883 daemon.go:828] pauseMCP(): MCP sriov is ready
      I0329 12:25:10.848409   69883 daemon.go:838] pauseMCP(): pause MCP sriov
      I0329 12:25:10.860448   69883 daemon.go:690] annotateNode(): Annotate node openshift-qe-027.lab.eng.rdu2.redhat.com with: Draining_MCP_Paused
      I0329 12:25:10.889359   69883 daemon.go:828] pauseMCP(): MCP sriov is ready
      I0329 12:25:10.889375   69883 daemon.go:830] pauseMCP(): stop MCP informer
      I0329 12:25:10.889396   69883 daemon.go:518] nodeStateSyncHandler(): drain node
      I0329 12:25:10.889401   69883 daemon.go:895] drainNode(): Update prepared
      I0329 12:25:10.889405   69883 daemon.go:905] drainNode(): Start draining
      E0329 12:25:11.768914   69883 daemon.go:128] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-jmr6f, openshift-dns/dns-default-pvqmg, openshift-dns/node-resolver-42kf2, openshift-image-registry/node-ca-7qjdr, openshift-ingress-canary/ingress-canary-szgj8, openshift-machine-config-operator/machine-config-daemon-vlngr, openshift-monitoring/node-exporter-4dfh4, openshift-multus/multus-additional-cni-plugins-fj9dq, openshift-multus/multus-dqxhf, openshift-multus/network-metrics-daemon-2t4mt, openshift-network-diagnostics/network-check-target-4dbsk, openshift-ovn-kubernetes/ovnkube-node-cvh8d, openshift-sriov-network-operator/sriov-device-plugin-mfzdp, openshift-sriov-network-operator/sriov-network-config-daemon-fsndz; deleting Pods that declare no controller: openshift-marketplace/qe-app-registry-s8zd7
      I0329 12:25:11.770765   69883 daemon.go:128] evicting pod default/hello-openshift-backup-847sb
      I0329 12:25:11.770781   69883 daemon.go:128] evicting pod default/hello-openshift-backup-4tkwc
      I0329 12:25:11.770789   69883 daemon.go:128] evicting pod default/hello-openshift-backup-jnr9k
      I0329 12:25:11.770799   69883 daemon.go:128] evicting pod default/hello-openshift-backup-ls74h
      I0329 12:25:11.770824   69883 daemon.go:128] evicting pod default/hello-openshift-backup-jc2ps
      I0329 12:25:11.770830   69883 daemon.go:128] evicting pod default/hello-openshift-backup-rhxpl
      I0329 12:25:11.770790   69883 daemon.go:128] evicting pod default/hello-openshift-backup-lsp5h
      I0329 12:25:11.770840   69883 daemon.go:128] evicting pod default/hello-openshift-backup-khvhn
      I0329 12:25:11.770852   69883 daemon.go:128] evicting pod default/hello-openshift-backup-xrsmt
      I0329 12:25:11.770862   69883 daemon.go:128] evicting pod default/hello-openshift-backup-zqvgh
      I0329 12:25:11.770876   69883 daemon.go:128] evicting pod default/hello-openshift-w9tmg
      I0329 12:25:11.770882   69883 daemon.go:128] evicting pod default/hello-openshift-backup-tz4df
      I0329 12:25:11.770904   69883 daemon.go:128] evicting pod default/hello-openshift-backup-q2stq
      I0329 12:25:11.770835   69883 daemon.go:128] evicting pod openshift-marketplace/qe-app-registry-s8zd7
      I0329 12:25:11.770770   69883 daemon.go:128] evicting pod default/hello-openshift-backup-2nl9x
      I0329 12:25:11.770857   69883 daemon.go:128] evicting pod default/hello-openshift-vcfhm
      I0329 12:25:12.771634   69883 request.go:682] Waited for 1.000462605s due to client-side throttling, not priority and fairness, request: POST:https://api-int.sriov.openshift-qe.sdn.com:6443/api/v1/namespaces/default/pods/hello-openshift-backup-ls74h/eviction
      I0329 12:25:13.427770   69883 daemon.go:166] Evicted pod from Node default/hello-openshift-backup-khvhn // the log abort here because we delete the pod from the other terminal
      lyman@lymans-MacBook-Pro env %
      lyman@lymans-MacBook-Pro env %
      
      5.check the kubelet log, it show the sriov daemon pod is deleted with  default gracePeriod=30s, not wait the shutdownGracePeriod.  And from the deletion time, we make sure the SIGTERM is sent before the sriov daemon finish draining.
      
      sh-4.4# chroot /host  
      sh-5.1# journalctl -u kubelet --since="10 minutes ago" | less
      ...
      Mar 29 12:25:13 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:13.498041    5713 kuberuntime_container.go:709] "Killing container with a grace period" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" podUID=af07f950-a282-4fee-adfd-9b86e5a67a3d containerName="sriov-cni" containerID="cri-o://a84448c56490d0b42d5fc6de8dfa4827bdd572071ac5a5dd6805ac208745d3e7" gracePeriod=30
      Mar 29 12:25:13 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:13.498138    5713 kuberuntime_container.go:709] "Killing container with a grace period" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" podUID=af07f950-a282-4fee-adfd-9b86e5a67a3d containerName="sriov-infiniband-cni" containerID="cri-o://76e3886e253a2dc68a82924021b6347cf98571d54b9249125d082c7d1c6b6764" gracePeriod=30
      ...
      Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.069539    5713 kubelet.go:2251] "SyncLoop (PLEG): event for pod" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" event=&{ID:af07f950-a282-4fee-adfd-9b86e5a67a3d Type:ContainerDied Data:76e3886e253a2dc68a82924021b6347cf98571db9249125d082c7d1c6b6764}
      Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.069553    5713 kubelet.go:2251] "SyncLoop (PLEG): event for pod" pod="openshift-sriov-network-operator/sriov-network-config-daemon-fsndz" event=&{ID:af07f950-a282-4fee-adfd-9b86e5a67a3d Type:ContainerDied Data:a84448c56490d0b42d5fc6de8dfa4827bdd572071ac5a5dd6805ac208745d3e7}...
      Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.111874    5713 kubelet.go:2235] "SyncLoop DELETE" source="api" pods="[openshift-sriov-network-operator/sriov-network-config-daemon-fsndz]"
      Mar 29 12:25:14 openshift-qe-027 kubenswrapper[5713]: I0329 12:25:14.116914    5713 kubelet.go:2229] "SyncLoop REMOVE" source="api" pods="[openshift-sriov-network-operator/sriov-network-config-daemon-fsndz]"
      

      Actual results:

      4. the sriov daemon didn't deal with the SIGTERM before it finished draining node

      Expected results:

      4. the sriov daemon should print log like below when it receive SIGTERM:
       
      ...I1001 22:19:34.032186 3462439 daemon.go:834] Got SIGTERM, but actively updating...

      Additional info:

      % oc get sriovnetworknodestates.sriovnetwork.openshift.io openshift-qe-027.lab.eng.rdu2.redhat.com  -n openshift-sriov-network-operator   -o yaml 
      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetworkNodeState
      metadata:
        creationTimestamp: "2023-03-29T01:38:09Z"
        generation: 6
        name: openshift-qe-027.lab.eng.rdu2.redhat.com
        namespace: openshift-sriov-network-operator
        ownerReferences:
        - apiVersion: sriovnetwork.openshift.io/v1
          blockOwnerDeletion: true
          controller: true
          kind: SriovNetworkNodePolicy
          name: default
          uid: e7104876-a5ab-4ab1-b1ab-5a6c3783e0b1
        resourceVersion: "237784"
        uid: 8ce1dc37-9fa6-40fa-83ad-d5bf71dd44b1
      spec:
        dpConfigVersion: "168304"
        interfaces:
        - linkType: eth
          name: ens1f1np1
          numVfs: 11
          pciAddress: 0000:3b:00.1
          vfGroups:
          - deviceType: netdevice
            policyName: sriov-nic-027
            resourceName: sriov_nic_027
            vfRange: 0-10
      status:
        interfaces:
        - deviceID: "1017"
          driver: mlx5_core
          eSwitchMode: legacy
          linkSpeed: 25000 Mb/s
          linkType: ETH
          mac: 04:3f:72:e4:15:aa
          mtu: 1500
          name: ens1f0np0
          pciAddress: 0000:3b:00.0
          totalvfs: 11
          vendor: 15b3
        - Vfs:
          - deviceID: "1018"
            driver: mlx5_core
            mac: 06:92:59:21:51:20
            mtu: 1500
            name: ens1f1v0
            pciAddress: 0000:3b:01.5
            vendor: 15b3
            vfID: 0
          - deviceID: "1018"
            driver: mlx5_core
            mac: 02:32:24:c3:88:3f
            mtu: 1500
            name: ens1f1v1
            pciAddress: 0000:3b:01.6
            vendor: 15b3
            vfID: 1
      
      ...
          - deviceID: "1018"
            driver: mlx5_core
            mac: de:6c:a3:8a:7f:9c
            mtu: 1500
            name: ens1f1v9
            pciAddress: 0000:3b:02.6
            vendor: 15b3
            vfID: 9
          deviceID: "1017"
          driver: mlx5_core
          eSwitchMode: legacy
          linkSpeed: 25000 Mb/s
          linkType: ETH
          mac: 04:3f:72:e4:15:ab
          mtu: 1500
          name: ens1f1np1
          numVfs: 11
          pciAddress: 0000:3b:00.1
          totalvfs: 11
          vendor: 15b3
        - deviceID: 159b
          driver: ice
          eSwitchMode: legacy
          linkSpeed: 25000 Mb/s
          linkType: ETH
          mac: b4:96:91:a5:c7:f4
          mtu: 1500
          name: ens2f0
          pciAddress: 0000:5e:00.0
          totalvfs: 128
          vendor: "8086"
        - deviceID: 159b
          driver: ice
          eSwitchMode: legacy
          linkSpeed: 25000 Mb/s
          linkType: ETH
          mac: b4:96:91:a5:c7:f5
          mtu: 1500
          name: ens2f1
          pciAddress: 0000:5e:00.1
          totalvfs: 128
          vendor: "8086"
        - deviceID: "1593"
          driver: ice
          eSwitchMode: legacy
          linkSpeed: -1 Mb/s
          linkType: ETH
          mac: b4:96:91:dc:74:68
          mtu: 1500
          name: ens5f0
          pciAddress: 0000:86:00.0
          totalvfs: 64
          vendor: "8086"
        - deviceID: "1593"
          driver: ice
          eSwitchMode: legacy
          linkSpeed: -1 Mb/s
          linkType: ETH
          mac: b4:96:91:dc:74:69
          mtu: 1500
          name: ens5f1
          pciAddress: 0000:86:00.1
          totalvfs: 64
          vendor: "8086"
      ...

              jerpeter@redhat.com Geri Peterson (Inactive)
              rhn-support-minmli Min Li
              None
              None
              Sunil Choudhary Sunil Choudhary
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: