Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-57709

ovs-vswitchd process affinity doesn't get changed back to it's original affinity when deployment running guaranteed pods is deleted

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None
    • None
    • ssg_core_kernel
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      ovs-vswitchd process affinity doesn't get changed back to it's original affinity when deployment running guaranteed pods is deleted
          

      Version-Release number of selected component (if applicable):

          4.17.0-0.nightly-2024-08-19-165854
      

      How reproducible:

          Everytime

      Steps to Reproduce:
      1.Apply Performance profile

      apiVersion: performance.openshift.io/v2
      kind: PerformanceProfile
      metadata:
        creationTimestamp: "2024-08-20T21:51:00Z"
        finalizers:
        - foreground-deletion
        generation: 60
        name: performance
        resourceVersion: "432939"
        uid: ddf30617-e30c-4a5c-bf59-9c8fe306ea5d
      spec:
        cpu:
          isolated: 1,3-11,13,15-23
          reserved: 0,2,12,14
        hugepages:
          defaultHugepagesSize: 1G
          pages:
          - count: 1
            node: 0
            size: 1G
          - count: 128
            node: 1
            size: 2M
        machineConfigPoolSelector:
          machineconfiguration.openshift.io/role: worker-cnf
        net:
          userLevelNetworking: false
        nodeSelector:
          node-role.kubernetes.io/worker-cnf: ""
        numa:
          topologyPolicy: single-numa-node
        realTimeKernel:
          enabled: false
        workloadHints:
          highPowerConsumption: true
          perPodPowerManagement: false
          realTime: true
          

      2.Set cgroups version to v1

      apiVersion: config.openshift.io/v1
      kind: Node
      metadata:
        annotations:
          include.release.openshift.io/ibm-cloud-managed: "true"
          include.release.openshift.io/self-managed-high-availability: "true"
          kubectl.kubernetes.io/last-applied-configuration: |
            {"apiVersion":"config.openshift.io/v1","kind":"Node","metadata":{"annotations":{},"name":"cluster"},"spec":{"cgroupMode":"v1"}}
          release.openshift.io/create-only: "true"
        creationTimestamp: "2024-08-20T20:34:19Z"
        generation: 2
        name: cluster
        ownerReferences:
        - apiVersion: config.openshift.io/v1
          kind: ClusterVersion
          name: version
          uid: 993b68e0-7dfc-4be1-88f2-af7fc1b0567c
        resourceVersion: "43979"
        uid: 03b1845c-53fb-4f15-91c3-5a3ba71adddb
      spec:
        cgroupMode: v1
      

      3.Create a deployment as shown below:

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: myapp-deployment
        labels:
          app: lbapp
          type: front-end
      spec:
        template:
          metadata:
            labels:
              app: lbapp
              type: front-end
          spec:
            containers:
            - name: testlb
              image: "quay.io/mniranja/busycpus"
              command:
              - sleep
              - inf
              resources:
                limits:
                  memory: "500Mi"
                  cpu: "2"
            imagePullPolicy: IfNotPresent
            runtimeClassName: performance-performance
            nodeSelector:
              kubernetes.io/hostname: ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com
        replicas: 2
        selector:
          matchLabels:
             type: front-end
      

      4. Check the affinity of ovs-vswitchd before deployment is enabled.

      sh-5.1# taskset -apc $(pidof ovs-vswitchd)
      pid 1497's current affinity list: 0-23
      pid 1502's current affinity list: 0-23
      pid 11476's current affinity list: 0-23
      pid 11477's current affinity list: 0-23
      pid 11478's current affinity list: 0-23
      pid 11479's current affinity list: 0-23
      pid 11480's current affinity list: 0-23
      pid 11481's current affinity list: 0-23
      pid 11482's current affinity list: 0-23
      pid 11483's current affinity list: 0-23
      pid 11484's current affinity list: 0-23
      pid 11485's current affinity list: 0-23
      pid 11486's current affinity list: 0-23
      pid 11487's current affinity list: 0-23
      pid 11488's current affinity list: 0-23
      pid 11489's current affinity list: 0-23
      pid 11490's current affinity list: 0-23
      pid 11491's current affinity list: 0-23
      pid 11492's current affinity list: 0-23
      pid 11493's current affinity list: 0-23
      pid 11494's current affinity list: 0-23
      pid 11495's current affinity list: 0-23
      pid 11496's current affinity list: 0-23
      pid 11497's current affinity list: 0-23
      pid 11498's current affinity list: 0-23
      pid 11499's current affinity list: 0-23
      pid 11500's current affinity list: 0-23
      pid 11501's current affinity list: 0-23
      pid 11502's current affinity list: 0-23
      pid 11503's current affinity list: 0-23
      pid 11504's current affinity list: 0-23
      pid 11505's current affinity list: 0-23
      pid 11506's current affinity list: 0-23
      

      5. Apply the deployment and get the cpus used by the pods

      [root@ocp-edge89 ~]# oc exec -it pods/myapp-deployment-54757d6f58-fw5km -- bash -c "cat /sys/fs/cgroup/cpuset/cpuset.cpus"
      4,6
      [root@ocp-edge89 ~]# oc exec -it pods/myapp-deployment-54757d6f58-hrpjg -- bash -c "cat /sys/fs/cgroup/cpuset/cpuset.cpus"
      8,10
      
      

      6. Verify affinity of ovs-vswitchd

      Unable to find source-code formatter for language: node. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      [root@ocp-edge89 ~]# oc debug node/ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com
      Starting pod/ocp4173003464-worker-0libvirtlabengtlv2redhatcom-debug-5ssjr ...
      To use host binaries, run `chroot /host`
      Pod IP: 192.168.122.106
      If you don't see a command prompt, try pressing enter.
      sh-5.1# chroot /host
      sh-5.1# taskset -apc $(pidof ovs-vswitchd)
      pid 1497's current affinity list: 0-3,5,7,9,11-23
      pid 1502's current affinity list: 0-3,5,7,9,11-23
      pid 13401's current affinity list: 0-3,5,7,9,11-23
      pid 13402's current affinity list: 0-3,5,7,9,11-23
      pid 13403's current affinity list: 0-3,5,7,9,11-23
      pid 13404's current affinity list: 0-3,5,7,9,11-23
      pid 13405's current affinity list: 0-3,5,7,9,11-23
      pid 13406's current affinity list: 0-3,5,7,9,11-23
      pid 13407's current affinity list: 0-3,5,7,9,11-23
      pid 13408's current affinity list: 0-3,5,7,9,11-23
      pid 13409's current affinity list: 0-3,5,7,9,11-23
      pid 13410's current affinity list: 0-3,5,7,9,11-23
      pid 13411's current affinity list: 0-3,5,7,9,11-23
      pid 13412's current affinity list: 0-3,5,7,9,11-23
      pid 13413's current affinity list: 0-3,5,7,9,11-23
      pid 13414's current affinity list: 0-3,5,7,9,11-23
      pid 13415's current affinity list: 0-3,5,7,9,11-23
      pid 13416's current affinity list: 0-3,5,7,9,11-23
      pid 13417's current affinity list: 0-3,5,7,9,11-23
      pid 13418's current affinity list: 0-3,5,7,9,11-23
      pid 13419's current affinity list: 0-3,5,7,9,11-23
      pid 13420's current affinity list: 0-3,5,7,9,11-23
      pid 13421's current affinity list: 0-3,5,7,9,11-23
      pid 13422's current affinity list: 0-3,5,7,9,11-23
      pid 13423's current affinity list: 0-3,5,7,9,11-23
      pid 13424's current affinity list: 0-3,5,7,9,11-23
      pid 13425's current affinity list: 0-3,5,7,9,11-23
      pid 13426's current affinity list: 0-3,5,7,9,11-23
      pid 13427's current affinity list: 0-3,5,7,9,11-23
      pid 13428's current affinity list: 0-3,5,7,9,11-23
      pid 13429's current affinity list: 0-3,5,7,9,11-23
      

      7. Delete the deployment

      [root@ocp-edge89 ~]# oc delete deployment/myapp-deployment
      deployment.apps "myapp-deployment" deleted
      

      8. wait for pods to be deleted

      [root@ocp-edge89 ~]# oc get pods
      NAME                                READY   STATUS        RESTARTS   AGE
      myapp-deployment-54757d6f58-fw5km   1/1     Terminating   0          15h
      myapp-deployment-54757d6f58-hrpjg   1/1     Terminating   0          15h
      [root@ocp-edge89 ~]# oc get pods
      [root@ocp-edge89 ~]# oc get pods
      No resources found in default namespace.
      

      9. . Check the cpu affinity of ovs-vswitchd process

      [root@ocp-edge89 ~]# oc debug node/ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com
      Starting pod/ocp4173003464-worker-0libvirtlabengtlv2redhatcom-debug-pljfx ...
      To use host binaries, run `chroot /host`
      Pod IP: 192.168.122.106
      If you don't see a command prompt, try pressing enter.
      sh-5.1# chroot /host
      
      sh-5.1# taskset -apc $(pidof ovs-vswitchd)
      pid 1497's current affinity list: 0-3,5,7,9,11-23
      pid 1502's current affinity list: 0-3,5,7,9,11-23
      pid 13401's current affinity list: 0-3,5,7,9,11-23
      pid 13402's current affinity list: 0-3,5,7,9,11-23
      pid 13403's current affinity list: 0-3,5,7,9,11-23
      pid 13404's current affinity list: 0-3,5,7,9,11-23
      pid 13405's current affinity list: 0-3,5,7,9,11-23
      pid 13406's current affinity list: 0-3,5,7,9,11-23
      pid 13407's current affinity list: 0-3,5,7,9,11-23
      pid 13408's current affinity list: 0-3,5,7,9,11-23
      pid 13409's current affinity list: 0-3,5,7,9,11-23
      pid 13410's current affinity list: 0-3,5,7,9,11-23
      pid 13411's current affinity list: 0-3,5,7,9,11-23
      pid 13412's current affinity list: 0-3,5,7,9,11-23
      pid 13413's current affinity list: 0-3,5,7,9,11-23
      pid 13414's current affinity list: 0-3,5,7,9,11-23
      pid 13415's current affinity list: 0-3,5,7,9,11-23
      pid 13416's current affinity list: 0-3,5,7,9,11-23
      pid 13417's current affinity list: 0-3,5,7,9,11-23
      pid 13418's current affinity list: 0-3,5,7,9,11-23
      pid 13419's current affinity list: 0-3,5,7,9,11-23
      pid 13420's current affinity list: 0-3,5,7,9,11-23
      pid 13421's current affinity list: 0-3,5,7,9,11-23
      pid 13422's current affinity list: 0-3,5,7,9,11-23
      pid 13423's current affinity list: 0-3,5,7,9,11-23
      pid 13424's current affinity list: 0-3,5,7,9,11-23
      pid 13425's current affinity list: 0-3,5,7,9,11-23
      pid 13426's current affinity list: 0-3,5,7,9,11-23
      pid 13427's current affinity list: 0-3,5,7,9,11-23
      pid 13428's current affinity list: 0-3,5,7,9,11-23
      pid 13429's current affinity list: 0-3,5,7,9,11-23
      

      10. Kubelet logs:

      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:         rpc error: code = Unknown desc = updating resources for co
      ntainer "142a5a134eadd9967d26bb31c7a0ea3136eca5e88e3080d345533b1773562f57" failed: writing file `cpuset.cpus`: Permission denied
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:          : exit status 1
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:  > containerID="142a5a134eadd9967d26bb31c7a0ea3136eca5e88e3080d345
      533b1773562f57"
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: E0822 07:10:30.565531    9127 cpu_manager.go:482] "ReconcileState:
       failed to update container" err=<
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:         rpc error: code = Unknown desc = updating resources for co
      ntainer "142a5a134eadd9967d26bb31c7a0ea3136eca5e88e3080d345533b1773562f57" failed: writing file `cpuset.cpus`: Permission denied
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:          : exit status 1
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:  > pod="openshift-network-operator/iptables-alerter-gmtdn" contain
      erName="iptables-alerter" containerID="142a5a134eadd9967d26bb31c7a0ea3136eca5e88e3080d345533b1773562f57" cpuSet="0-23"
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: E0822 07:10:30.569747    9127 remote_runtime.go:461] "UpdateContai
      nerResources from runtime service failed" err=<
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:         rpc error: code = Unknown desc = updating resources for co
      ntainer "3b22f038f882fe3e994dcd481f88f69eb79f4381d4c944eda9f649bb8879b6cd" failed: writing file `cpuset.cpus`: Permission denied
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:          : exit status 1
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:  > containerID="3b22f038f882fe3e994dcd481f88f69eb79f4381d4c944eda9
      f649bb8879b6cd"
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: E0822 07:10:30.569774    9127 cpu_manager.go:482] "ReconcileState:
       failed to update container" err=<
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:         rpc error: code = Unknown desc = updating resources for co
      ntainer "3b22f038f882fe3e994dcd481f88f69eb79f4381d4c944eda9f649bb8879b6cd" failed: writing file `cpuset.cpus`: Permission denied
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:          : exit status 1
      Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]:  > pod="openshift-monitoring/metrics-server-96b8bc9bd-z6gnw" conta
      inerName="metrics-server" containerID="3b22f038f882fe3e994dcd481f88f69eb79f4381d4c944eda9f649bb8879b6cd" cpuSet="0-23"
      

      Actual results:

      cpu affinity of ovs-vswitchd should be 0-23 when the gu pods are deleted but their affinity doesn't change 
          

      Expected results:

      cpu affinity of ovs-vswitchd should be 0-23 when the gu pods are deleted 
          

      Additional info:

      This issue occurs only when using crun and doesn't happen when using runc 
      Also if we reboot the system the affinity of ovs-vswitchd is back to what was the original prior to deploying guaranteed pods. 
          

        1. kubelet.logs
          1.04 MB
        2. crio.logs
          918 kB

              llong@redhat.com Waiman Long
              mniranja Mallapadi Niranjan
              Sunil Choudhary Sunil Choudhary
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: