Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10426

node-topology is not exported due to kubelet.sock: connect: permission denied

XMLWordPrintable

    • Critical
    • No
    • CNF Compute Sprint 233, CNF Compute Sprint 234, CNF Compute Sprint 235
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Rel Note for Telco: Not Required

      telco blocker. Not OCP blocker.

      20230327: workaround committed and test unblocked. In general not blocker anymore but still very high priority to narrow down the rte-specific policy

      Working actively to find a workaround to unblock further testing.
      We need to review the delivered policies for 4.13 and fix them.
      Short term fix is to adjust the policy.
      mid/long term term fix is yet under investigation
      Contingency plan is to run the RTE pods as privileged.
      Show
      Rel Note for Telco: Not Required telco blocker. Not OCP blocker. 20230327: workaround committed and test unblocked. In general not blocker anymore but still very high priority to narrow down the rte-specific policy Working actively to find a workaround to unblock further testing. We need to review the delivered policies for 4.13 and fix them. Short term fix is to adjust the policy. mid/long term term fix is yet under investigation Contingency plan is to run the RTE pods as privileged.

      Description of problem:

      NROP node topology is not exported due to kubelet.sock: connect: permission denied. As a result all tests are skipped/failed.
      
      
      

      Version-Release number of selected component (if applicable):

      oc get nodes -o wide
      NAME       STATUS   ROLES                  AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                            KERNEL-VERSION          CONTAINER-RUNTIME
      master-0   Ready    control-plane,master   23h   v1.26.2+bc894ae   10.46.82.20   <none>        CentOS Stream CoreOS 413.92.202303061740-0 (Plow)   5.14.0-282.el9.x86_64   cri-o://1.26.1-10.rhaos4.13.gitcb86088.el9
      master-1   Ready    control-plane,master   23h   v1.26.2+bc894ae   10.46.82.21   <none>        CentOS Stream CoreOS 413.92.202303061740-0 (Plow)   5.14.0-282.el9.x86_64   cri-o://1.26.1-10.rhaos4.13.gitcb86088.el9
      master-2   Ready    control-plane,master   23h   v1.26.2+bc894ae   10.46.82.22   <none>        CentOS Stream CoreOS 413.92.202303061740-0 (Plow)   5.14.0-282.el9.x86_64   cri-o://1.26.1-10.rhaos4.13.gitcb86088.el9
      worker-0   Ready    worker                 22h   v1.26.2+bc894ae   10.46.82.23   <none>        CentOS Stream CoreOS 413.92.202303061740-0 (Plow)   5.14.0-282.el9.x86_64   cri-o://1.26.1-10.rhaos4.13.gitcb86088.el9
      worker-1   Ready    worker                 22h   v1.26.2+bc894ae   10.46.82.24   <none>        CentOS Stream CoreOS 413.92.202303061740-0 (Plow)   5.14.0-282.el9.x86_64   cri-o://1.26.1-10.rhaos4.13.gitcb86088.el9
       
      oc version
      Client Version: 4.13.0-0.nightly-2023-03-13-182639
      Kustomize Version: v4.5.7
      Server Version: 4.13.0-0.nightly-2023-03-13-182639
      Kubernetes Version: v1.26.2+06e8c46
      

      How reproducible:

      Deploy NROP and configure basic setting according to: https://docs.openshift.com/container-platform/4.11/scalability_and_performance/cnf-numa-aware-scheduling.html
      Check that oc get numaresourcesoperators.nodetopology.openshift.io -o yaml resource is empty.
      
      
      

      Steps to Reproduce:

      1. Install NROP operator
      2. Configure basic settings 
      3. Check numaresourcesoperators.nodetopology.openshift.io
      >oc get numaresourcesoperators.nodetopology.openshift.io -o yaml 
      It'll be empty
      

      Actual results:

      oc get numaresourcesoperators.nodetopology.openshift.io -o yaml show represent nrop topology
      
      

      Expected results:

      NROP topology is exported.
      
      

      Additional info:

      Logs from numaresourcesoperator-worker-
      I0316 19:25:11.420139       1 round_trippers.go:553] GET https://172.30.0.1:443/api/v1/namespaces/openshift-numaresources/pods/numaresourcesoperator-worker-t5sh9 200 OK in 5 milliseconds
      I0316 19:25:21.417282       1 notification.go:94] timer update trigger
      W0316 19:25:21.417443       1 resourceobserver.go:72] failed to scan pod resources: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /host-podresources-socket/kubelet.sock: connect: permission denied"
      I0316 19:25:21.425772       1 round_trippers.go:553] GET https://172.30.0.1:443/api/v1/namespaces/openshift-numaresources/pods/numaresourcesoperator-worker-t5sh9 200 OK in 8 milliseconds
      
      
      

            fromani@redhat.com Francesco Romani
            nkononov@redhat.com Nikita Kononov
            Mallapadi Niranjan Mallapadi Niranjan
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: