Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-53203

Due to permissions issue connecting to kubelet.sock the RTE pods fail to start

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.13.z
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Due to permission issues connecting to kubelet.sock , RTE Pods fail to start.

      Version-Release number of selected component (if applicable):

          4.13.22

      How reproducible:

      Everytime    

      Steps to Reproduce:

          1. Deploy NROP 
          2. Deploy RTE pods
          3. 
          

      Actual results:

          [root@helix36 ~]# oc get pods
      NAME                                              READY   STATUS             RESTARTS         AGE
      numaresources-controller-manager-9cfc4f89-r4xgz   1/1     Running            0                6h42m
      numaresourcesoperator-worker-cnf-wchfz            1/2     CrashLoopBackOff   82 (3m35s ago)   6h38m
      

      Expected results:

      RTE pods should start    

      Additional info:

          [root@helix36 ~]# oc logs pods/numaresourcesoperator-worker-cnf-wchfz
      Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container
      I0317 15:13:43.650231       1 main.go:65] starting resource-topology-exporter 4.13.8 3f8156393a74dd88b1c380a01d8d76f361fc4187 go1.19.13
      I0317 15:13:43.650554       1 main.go:294] using Topology Manager scope "container" from "conf" (conf=container) policy "single-numa-node" from "conf" (conf=single-numa-node)
      I0317 15:13:43.651258       1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
      I0317 15:13:43.651279       1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock"
      I0317 15:13:43.651693       1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
      I0317 15:13:43.651744       1 prometheus.go:113] prometheus endpoint disabled
      I0317 15:13:43.651756       1 podexclude.go:87] > POD excludes:
      I0317 15:13:43.651783       1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container"
      I0317 15:13:43.651886       1 notification.go:129] added interval every 10s
      I0317 15:13:43.651938       1 resourcemonitor.go:153] resource monitor for "ocp4132201838-worker-0.libvirt.lab.eng.tlv2.redhat.com" starting
      I0317 15:13:43.682265       1 resourcemonitor.go:175] tracking node resources
      F0317 15:13:43.683140       1 main.go:112] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection err
      or: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"
      

              aos-node@redhat.com Node Team Bot Account
              mniranja Mallapadi Niranjan
              None
              None
              Mallapadi Niranjan Mallapadi Niranjan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: