Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57697

SNO 4.16.42 - Several platform pods placed in the isolated cpus gets `cpuset.cpus`: Permission denied

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.16.z
    • Node / CPU manager
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • None
    • None
    • None
    • None
    • Customer Escalated
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          After upgrade the cluster to 4.16.42 from 4.16.38, the pods in the below namespaces got "Permission denied". This happens in different SNOs. 
      
      - vran-acceleration-operators (SNO name : hermes)
      
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.616453   11544 remote_runtime.go:461] "UpdateContainerResources from runtime service failed" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "c6c080b2f8dc6994be943d3a11c294b326dafa285e8a7c2a87b4ce2187cde7d1" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > containerID="c6c080b2f8dc6994be943d3a11c294b326dafa285e8a7c2a87b4ce2187cde7d1"
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.616503   11544 cpu_manager.go:496] "ReconcileState: failed to update container" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "c6c080b2f8dc6994be943d3a11c294b326dafa285e8a7c2a87b4ce2187cde7d1" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > pod="vran-acceleration-operators/sriov-fec-daemonset-xd6lc" containerName="sriov-fec-daemon" containerID="c6c080b2f8dc6994be943d3a11c294b326dafa285e8a7c2a87b4ce2187cde7d1" cpuSet="2-31,34-63"
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.622272   11544 remote_runtime.go:461] "UpdateContainerResources from runtime service failed" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "a29f0fc0c5182f6c4414f01dc701f8d4373889589b0d0bf6ca857624e0ec1739" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > containerID="a29f0fc0c5182f6c4414f01dc701f8d4373889589b0d0bf6ca857624e0ec1739"
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.622329   11544 cpu_manager.go:496] "ReconcileState: failed to update container" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "a29f0fc0c5182f6c4414f01dc701f8d4373889589b0d0bf6ca857624e0ec1739" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > pod="vran-acceleration-operators/sriov-device-plugin-jnf5h" containerName="sriov-device-plugin" containerID="a29f0fc0c5182f6c4414f01dc701f8d4373889589b0d0bf6ca857624e0ec1739" cpuSet="2-31,34-63"
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.626537   11544 remote_runtime.go:461] "UpdateContainerResources from runtime service failed" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "686b52a3cdd1e8d44b0d61a512f137af142d73e8bceef24481630eafdd4fe9d3" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > containerID="686b52a3cdd1e8d44b0d61a512f137af142d73e8bceef24481630eafdd4fe9d3"
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.626587   11544 cpu_manager.go:496] "ReconcileState: failed to update container" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "686b52a3cdd1e8d44b0d61a512f137af142d73e8bceef24481630eafdd4fe9d3" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > pod="vran-acceleration-operators/accelerator-discovery-7zd5t" containerName="accelerator-discovery" containerID="686b52a3cdd1e8d44b0d61a512f137af142d73e8bceef24481630eafdd4fe9d3" cpuSet="2-31,34-63"
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.631297   11544 remote_runtime.go:461] "UpdateContainerResources from runtime service failed" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "8010cc8cb8dfdcde1e55a2c36ec991b47d29db3fd4a5ff8f7f1197e525518b14" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > containerID="8010cc8cb8dfdcde1e55a2c36ec991b47d29db3fd4a5ff8f7f1197e525518b14"
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.631360   11544 cpu_manager.go:496] "ReconcileState: failed to update container" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "8010cc8cb8dfdcde1e55a2c36ec991b47d29db3fd4a5ff8f7f1197e525518b14" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > pod="vran-acceleration-operators/sriov-fec-controller-manager-56f79b4dc4-r85mj" containerName="kube-rbac-proxy" containerID="8010cc8cb8dfdcde1e55a2c36ec991b47d29db3fd4a5ff8f7f1197e525518b14" cpuSet="2-31,34-63"
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.636170   11544 remote_runtime.go:461] "UpdateContainerResources from runtime service failed" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "7e140cedc60ff51021dafe31f062e62e2e4141e4dc6b08588500a32991133d5d" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > containerID="7e140cedc60ff51021dafe31f062e62e2e4141e4dc6b08588500a32991133d5d"
      Jun 17 07:42:27 hp-master4 bash[11544]: E0617 07:42:27.636229   11544 cpu_manager.go:496] "ReconcileState: failed to update container" err=<
      Jun 17 07:42:27 hp-master4 bash[11544]:         rpc error: code = Unknown desc = updating resources for container "7e140cedc60ff51021dafe31f062e62e2e4141e4dc6b08588500a32991133d5d" failed: writing file `cpuset.cpus`: Permission denied
      Jun 17 07:42:27 hp-master4 bash[11544]:          : exit status 1
      Jun 17 07:42:27 hp-master4 bash[11544]:  > pod="vran-acceleration-operators/sriov-fec-controller-manager-56f79b4dc4-r85mj" containerName="manager" containerID="7e140cedc60ff51021dafe31f062e62e2e4141e4dc6b08588500a32991133d5d" cpuSet="2-31,34-63"
      
      
      - openshift-marketplace (SNO name : hp-master4)
      
      Jun 18 00:01:31 hermes.hermes.samsung.local bash[2379595]: E0618 00:01:31.928796 2379595 remote_runtime.go:461] "UpdateContainerResources from runtime service failed" err=<
      Jun 18 00:01:31 hermes.hermes.samsung.local bash[2379595]:         rpc error: code = Unknown desc = updating resources for container "c1a34bc9f87bcdbf3ee15f1dcdbd54578e5d9410b813004f79dbaa30b1256f8f" failed: writing file `cpuset.cpus`: Permission denied
      Jun 18 00:01:31 hermes.hermes.samsung.local bash[2379595]:          : exit status 1
      Jun 18 00:01:31 hermes.hermes.samsung.local bash[2379595]:  > containerID="c1a34bc9f87bcdbf3ee15f1dcdbd54578e5d9410b813004f79dbaa30b1256f8f"
      Jun 18 00:01:31 hermes.hermes.samsung.local bash[2379595]: E0618 00:01:31.928857 2379595 cpu_manager.go:496] "ReconcileState: failed to update container" err=<
      Jun 18 00:01:31 hermes.hermes.samsung.local bash[2379595]:         rpc error: code = Unknown desc = updating resources for container "c1a34bc9f87bcdbf3ee15f1dcdbd54578e5d9410b813004f79dbaa30b1256f8f" failed: writing file `cpuset.cpus`: Permission denied
      Jun 18 00:01:31 hermes.hermes.samsung.local bash[2379595]:          : exit status 1
      Jun 18 00:01:31 hermes.hermes.samsung.local bash[2379595]:  > pod="openshift-marketplace/redhat-operator-index-6vbpl" containerName="registry-server" containerID="c1a34bc9f87bcdbf3ee15f1dcdbd54578e5d9410b813004f79dbaa30b1256f8f" cpuSet="2-31,34-63"
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]: E0618 00:01:36.933924 2379595 remote_runtime.go:461] "UpdateContainerResources from runtime service failed" err=<
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:         rpc error: code = Unknown desc = updating resources for container "c1a34bc9f87bcdbf3ee15f1dcdbd54578e5d9410b813004f79dbaa30b1256f8f" failed: writing file `cpuset.cpus`: Permission denied
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:          : exit status 1
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:  > containerID="c1a34bc9f87bcdbf3ee15f1dcdbd54578e5d9410b813004f79dbaa30b1256f8f"
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]: E0618 00:01:36.933972 2379595 cpu_manager.go:496] "ReconcileState: failed to update container" err=<
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:         rpc error: code = Unknown desc = updating resources for container "c1a34bc9f87bcdbf3ee15f1dcdbd54578e5d9410b813004f79dbaa30b1256f8f" failed: writing file `cpuset.cpus`: Permission denied
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:          : exit status 1
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:  > pod="openshift-marketplace/redhat-operator-index-6vbpl" containerName="registry-server" containerID="c1a34bc9f87bcdbf3ee15f1dcdbd54578e5d9410b813004f79dbaa30b1256f8f" cpuSet="2-31,34-63"
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]: E0618 00:01:36.938353 2379595 remote_runtime.go:461] "UpdateContainerResources from runtime service failed" err=<
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:         rpc error: code = Unknown desc = updating resources for container "313c1e3bc8c42620bccfad1352795d01668f928d4699b402bf8450a316d2ea35" failed: writing file `cpuset.cpus`: Permission denied
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:          : exit status 1
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:  > containerID="313c1e3bc8c42620bccfad1352795d01668f928d4699b402bf8450a316d2ea35"
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]: E0618 00:01:36.938406 2379595 cpu_manager.go:496] "ReconcileState: failed to update container" err=<
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:         rpc error: code = Unknown desc = updating resources for container "313c1e3bc8c42620bccfad1352795d01668f928d4699b402bf8450a316d2ea35" failed: writing file `cpuset.cpus`: Permission denied
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:          : exit status 1
      Jun 18 00:01:36 hermes.hermes.samsung.local bash[2379595]:  > pod="openshift-marketplace/certified-operator-index-j4xwp" containerName="registry-server" containerID="313c1e3bc8c42620bccfad1352795d01668f928d4699b402bf8450a316d2ea35" cpuSet="2-31,34-63"

      Version-Release number of selected component (if applicable):

          4.16.42

      How reproducible:

          Upgrade the cluster from 4.16.38 to 4.16.42

      Steps to Reproduce:

          1. 
          2.
          3.
          

      Actual results:

      Some platform pods which are placed in isolated cpus get "Permission denied" in cpuset.cpus

      Expected results:

      They should be placed in isolated cpus well.     

      Additional info:

          It seems like after restarting pods, the error messages disappeared. 

              fromani@redhat.com Francesco Romani
              soh@redhat.com Sehee Oh
              None
              None
              Mallapadi Niranjan Mallapadi Niranjan
              None
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated: