Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62225

numaresourcesoperator pods fail to start - Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • x86_64
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          numaresourceoperator pods are in CrashLoopBackOff state with next error message:
      Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container
      I0925 07:28:32.448592       1 main.go:66] starting resource-topology-exporter 4.16.6 9692c3190f9868ef04d188a42ad8d0bac954e756 go1.22.12 (Red Hat 1.22.12-2.el9_5) X:strictfipsruntime
      I0925 07:28:32.448868       1 main.go:307] using Topology Manager scope "container" from "default" (conf=) policy "single-numa-node" from "conf" (conf=single-numa-node)
      I0925 07:28:32.449382       1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
      I0925 07:28:32.449402       1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock"
      I0925 07:28:32.449790       1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
      I0925 07:28:32.449812       1 setup.go:90] metrics endpoint disabled
      I0925 07:28:32.449822       1 podexclude.go:99] > POD excludes:
      I0925 07:28:32.449835       1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container"
      I0925 07:28:32.449887       1 notification.go:123] added interval every 10s
      I0925 07:28:32.449912       1 resourcemonitor.go:153] resource monitor for "ocp-node-8.spoke00.core.telcoqe.eng.rdu2.dc.redhat.com" starting
      I0925 07:28:32.615362       1 resourcemonitor.go:175] tracking node resources
      F0925 07:28:32.615893       1 main.go:118] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"
      

      Version-Release number of selected component (if applicable):

      OCP: 4.16.48
      numaresources-operator.v4.16.6

      How reproducible:

          1st occurrence so far

      Steps to Reproduce:

          1. Deploy baremetal dualstack cluster
          2. Install and configure NROP
          3. Check status of numaresourceoperator pod(s)
          

      Actual results:

          Pods are in CrashLoopBackOff state

      Expected results:

          Pods are running without any errors

      Additional info:

          

              rhn-support-shajmakh Shereen Haj
              yprokule@redhat.com Yurii Prokulevych
              None
              None
              Mallapadi Niranjan Mallapadi Niranjan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: