-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
None
-
None
-
ssg_core_kernel
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
Description of problem:
ovs-vswitchd process affinity doesn't get changed back to it's original affinity when deployment running guaranteed pods is deleted
Version-Release number of selected component (if applicable):
4.17.0-0.nightly-2024-08-19-165854
How reproducible:
Everytime
Steps to Reproduce:
1.Apply Performance profile
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: creationTimestamp: "2024-08-20T21:51:00Z" finalizers: - foreground-deletion generation: 60 name: performance resourceVersion: "432939" uid: ddf30617-e30c-4a5c-bf59-9c8fe306ea5d spec: cpu: isolated: 1,3-11,13,15-23 reserved: 0,2,12,14 hugepages: defaultHugepagesSize: 1G pages: - count: 1 node: 0 size: 1G - count: 128 node: 1 size: 2M machineConfigPoolSelector: machineconfiguration.openshift.io/role: worker-cnf net: userLevelNetworking: false nodeSelector: node-role.kubernetes.io/worker-cnf: "" numa: topologyPolicy: single-numa-node realTimeKernel: enabled: false workloadHints: highPowerConsumption: true perPodPowerManagement: false realTime: true
2.Set cgroups version to v1
apiVersion: config.openshift.io/v1 kind: Node metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"config.openshift.io/v1","kind":"Node","metadata":{"annotations":{},"name":"cluster"},"spec":{"cgroupMode":"v1"}} release.openshift.io/create-only: "true" creationTimestamp: "2024-08-20T20:34:19Z" generation: 2 name: cluster ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: 993b68e0-7dfc-4be1-88f2-af7fc1b0567c resourceVersion: "43979" uid: 03b1845c-53fb-4f15-91c3-5a3ba71adddb spec: cgroupMode: v1
3.Create a deployment as shown below:
apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deployment labels: app: lbapp type: front-end spec: template: metadata: labels: app: lbapp type: front-end spec: containers: - name: testlb image: "quay.io/mniranja/busycpus" command: - sleep - inf resources: limits: memory: "500Mi" cpu: "2" imagePullPolicy: IfNotPresent runtimeClassName: performance-performance nodeSelector: kubernetes.io/hostname: ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com replicas: 2 selector: matchLabels: type: front-end
4. Check the affinity of ovs-vswitchd before deployment is enabled.
sh-5.1# taskset -apc $(pidof ovs-vswitchd) pid 1497's current affinity list: 0-23 pid 1502's current affinity list: 0-23 pid 11476's current affinity list: 0-23 pid 11477's current affinity list: 0-23 pid 11478's current affinity list: 0-23 pid 11479's current affinity list: 0-23 pid 11480's current affinity list: 0-23 pid 11481's current affinity list: 0-23 pid 11482's current affinity list: 0-23 pid 11483's current affinity list: 0-23 pid 11484's current affinity list: 0-23 pid 11485's current affinity list: 0-23 pid 11486's current affinity list: 0-23 pid 11487's current affinity list: 0-23 pid 11488's current affinity list: 0-23 pid 11489's current affinity list: 0-23 pid 11490's current affinity list: 0-23 pid 11491's current affinity list: 0-23 pid 11492's current affinity list: 0-23 pid 11493's current affinity list: 0-23 pid 11494's current affinity list: 0-23 pid 11495's current affinity list: 0-23 pid 11496's current affinity list: 0-23 pid 11497's current affinity list: 0-23 pid 11498's current affinity list: 0-23 pid 11499's current affinity list: 0-23 pid 11500's current affinity list: 0-23 pid 11501's current affinity list: 0-23 pid 11502's current affinity list: 0-23 pid 11503's current affinity list: 0-23 pid 11504's current affinity list: 0-23 pid 11505's current affinity list: 0-23 pid 11506's current affinity list: 0-23
5. Apply the deployment and get the cpus used by the pods
[root@ocp-edge89 ~]# oc exec -it pods/myapp-deployment-54757d6f58-fw5km -- bash -c "cat /sys/fs/cgroup/cpuset/cpuset.cpus" 4,6 [root@ocp-edge89 ~]# oc exec -it pods/myapp-deployment-54757d6f58-hrpjg -- bash -c "cat /sys/fs/cgroup/cpuset/cpuset.cpus" 8,10
6. Verify affinity of ovs-vswitchd
Unable to find source-code formatter for language: node. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
[root@ocp-edge89 ~]# oc debug node/ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com
Starting pod/ocp4173003464-worker-0libvirtlabengtlv2redhatcom-debug-5ssjr ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.122.106
If you don't see a command prompt, try pressing enter.
sh-5.1# chroot /host
sh-5.1# taskset -apc $(pidof ovs-vswitchd)
pid 1497's current affinity list: 0-3,5,7,9,11-23
pid 1502's current affinity list: 0-3,5,7,9,11-23
pid 13401's current affinity list: 0-3,5,7,9,11-23
pid 13402's current affinity list: 0-3,5,7,9,11-23
pid 13403's current affinity list: 0-3,5,7,9,11-23
pid 13404's current affinity list: 0-3,5,7,9,11-23
pid 13405's current affinity list: 0-3,5,7,9,11-23
pid 13406's current affinity list: 0-3,5,7,9,11-23
pid 13407's current affinity list: 0-3,5,7,9,11-23
pid 13408's current affinity list: 0-3,5,7,9,11-23
pid 13409's current affinity list: 0-3,5,7,9,11-23
pid 13410's current affinity list: 0-3,5,7,9,11-23
pid 13411's current affinity list: 0-3,5,7,9,11-23
pid 13412's current affinity list: 0-3,5,7,9,11-23
pid 13413's current affinity list: 0-3,5,7,9,11-23
pid 13414's current affinity list: 0-3,5,7,9,11-23
pid 13415's current affinity list: 0-3,5,7,9,11-23
pid 13416's current affinity list: 0-3,5,7,9,11-23
pid 13417's current affinity list: 0-3,5,7,9,11-23
pid 13418's current affinity list: 0-3,5,7,9,11-23
pid 13419's current affinity list: 0-3,5,7,9,11-23
pid 13420's current affinity list: 0-3,5,7,9,11-23
pid 13421's current affinity list: 0-3,5,7,9,11-23
pid 13422's current affinity list: 0-3,5,7,9,11-23
pid 13423's current affinity list: 0-3,5,7,9,11-23
pid 13424's current affinity list: 0-3,5,7,9,11-23
pid 13425's current affinity list: 0-3,5,7,9,11-23
pid 13426's current affinity list: 0-3,5,7,9,11-23
pid 13427's current affinity list: 0-3,5,7,9,11-23
pid 13428's current affinity list: 0-3,5,7,9,11-23
pid 13429's current affinity list: 0-3,5,7,9,11-23
7. Delete the deployment
[root@ocp-edge89 ~]# oc delete deployment/myapp-deployment deployment.apps "myapp-deployment" deleted
8. wait for pods to be deleted
[root@ocp-edge89 ~]# oc get pods NAME READY STATUS RESTARTS AGE myapp-deployment-54757d6f58-fw5km 1/1 Terminating 0 15h myapp-deployment-54757d6f58-hrpjg 1/1 Terminating 0 15h [root@ocp-edge89 ~]# oc get pods [root@ocp-edge89 ~]# oc get pods No resources found in default namespace.
9. . Check the cpu affinity of ovs-vswitchd process
[root@ocp-edge89 ~]# oc debug node/ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com Starting pod/ocp4173003464-worker-0libvirtlabengtlv2redhatcom-debug-pljfx ... To use host binaries, run `chroot /host` Pod IP: 192.168.122.106 If you don't see a command prompt, try pressing enter. sh-5.1# chroot /host sh-5.1# taskset -apc $(pidof ovs-vswitchd) pid 1497's current affinity list: 0-3,5,7,9,11-23 pid 1502's current affinity list: 0-3,5,7,9,11-23 pid 13401's current affinity list: 0-3,5,7,9,11-23 pid 13402's current affinity list: 0-3,5,7,9,11-23 pid 13403's current affinity list: 0-3,5,7,9,11-23 pid 13404's current affinity list: 0-3,5,7,9,11-23 pid 13405's current affinity list: 0-3,5,7,9,11-23 pid 13406's current affinity list: 0-3,5,7,9,11-23 pid 13407's current affinity list: 0-3,5,7,9,11-23 pid 13408's current affinity list: 0-3,5,7,9,11-23 pid 13409's current affinity list: 0-3,5,7,9,11-23 pid 13410's current affinity list: 0-3,5,7,9,11-23 pid 13411's current affinity list: 0-3,5,7,9,11-23 pid 13412's current affinity list: 0-3,5,7,9,11-23 pid 13413's current affinity list: 0-3,5,7,9,11-23 pid 13414's current affinity list: 0-3,5,7,9,11-23 pid 13415's current affinity list: 0-3,5,7,9,11-23 pid 13416's current affinity list: 0-3,5,7,9,11-23 pid 13417's current affinity list: 0-3,5,7,9,11-23 pid 13418's current affinity list: 0-3,5,7,9,11-23 pid 13419's current affinity list: 0-3,5,7,9,11-23 pid 13420's current affinity list: 0-3,5,7,9,11-23 pid 13421's current affinity list: 0-3,5,7,9,11-23 pid 13422's current affinity list: 0-3,5,7,9,11-23 pid 13423's current affinity list: 0-3,5,7,9,11-23 pid 13424's current affinity list: 0-3,5,7,9,11-23 pid 13425's current affinity list: 0-3,5,7,9,11-23 pid 13426's current affinity list: 0-3,5,7,9,11-23 pid 13427's current affinity list: 0-3,5,7,9,11-23 pid 13428's current affinity list: 0-3,5,7,9,11-23 pid 13429's current affinity list: 0-3,5,7,9,11-23
10. Kubelet logs:
Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: rpc error: code = Unknown desc = updating resources for co ntainer "142a5a134eadd9967d26bb31c7a0ea3136eca5e88e3080d345533b1773562f57" failed: writing file `cpuset.cpus`: Permission denied Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: : exit status 1 Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: > containerID="142a5a134eadd9967d26bb31c7a0ea3136eca5e88e3080d345 533b1773562f57" Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: E0822 07:10:30.565531 9127 cpu_manager.go:482] "ReconcileState: failed to update container" err=< Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: rpc error: code = Unknown desc = updating resources for co ntainer "142a5a134eadd9967d26bb31c7a0ea3136eca5e88e3080d345533b1773562f57" failed: writing file `cpuset.cpus`: Permission denied Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: : exit status 1 Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: > pod="openshift-network-operator/iptables-alerter-gmtdn" contain erName="iptables-alerter" containerID="142a5a134eadd9967d26bb31c7a0ea3136eca5e88e3080d345533b1773562f57" cpuSet="0-23" Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: E0822 07:10:30.569747 9127 remote_runtime.go:461] "UpdateContai nerResources from runtime service failed" err=< Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: rpc error: code = Unknown desc = updating resources for co ntainer "3b22f038f882fe3e994dcd481f88f69eb79f4381d4c944eda9f649bb8879b6cd" failed: writing file `cpuset.cpus`: Permission denied Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: : exit status 1 Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: > containerID="3b22f038f882fe3e994dcd481f88f69eb79f4381d4c944eda9 f649bb8879b6cd" Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: E0822 07:10:30.569774 9127 cpu_manager.go:482] "ReconcileState: failed to update container" err=< Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: rpc error: code = Unknown desc = updating resources for co ntainer "3b22f038f882fe3e994dcd481f88f69eb79f4381d4c944eda9f649bb8879b6cd" failed: writing file `cpuset.cpus`: Permission denied Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: : exit status 1 Aug 22 07:10:30 ocp4173003464-worker-0.libvirt.lab.eng.tlv2.redhat.com kubenswrapper[9127]: > pod="openshift-monitoring/metrics-server-96b8bc9bd-z6gnw" conta inerName="metrics-server" containerID="3b22f038f882fe3e994dcd481f88f69eb79f4381d4c944eda9f649bb8879b6cd" cpuSet="0-23"
Actual results:
cpu affinity of ovs-vswitchd should be 0-23 when the gu pods are deleted but their affinity doesn't change
Expected results:
cpu affinity of ovs-vswitchd should be 0-23 when the gu pods are deleted
Additional info:
This issue occurs only when using crun and doesn't happen when using runc Also if we reboot the system the affinity of ovs-vswitchd is back to what was the original prior to deploying guaranteed pods.
- relates to
-
OCPBUGS-43995 The cpu affinity of burstable pods not get updated when deleting Guaranteed pod with runc
- ASSIGNED