Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-73896

[4.21 regression]PSI metrics exist in cadvisor before enabling PSI

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 4.21
    • Node / Kubelet
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • Yes
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Before enabling PSI on 4.21, I can query the PSI metrics from the /metrics/cadvisor endpoint. But the values of the metrics are all 0.

      Version-Release number of selected component (if applicable):

      OCP: 4.21.0-0.nightly-2026-01-13-111112sh-5.1
      Linux: 5.14.0-570.78.1.el9_6.x86_64

      How reproducible:

      I can not reproduce this on 4.20.0-0.nightly-2026-01-13-225320

      Steps to Reproduce:

          1. Install a OCP Cluster 4.21. Do NOT enable PSI.
          2. Grep PSI metrics by quering the /metrics/cadvisor endpoint to see

      Actual results:

      I deployed about 600 test pods, so the numbers are big. But this is not a necessary step to reproduce this bug.
      Query all PSI metrics on the nodes
      
      # for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do echo "=== Node: $node ==="; for metric in cpu_waiting cpu_stalled memory_waiting memory_stalled io_waiting io_stalled; do echo -n "container_pressure_${metric}_seconds_total: "; kubectl get --raw "/api/v1/nodes/$node/proxy/metrics/cadvisor" | grep "container_pressure_${metric}_seconds_total" | wc -l; done; done
      === Node: ip-10-0-11-217.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:      267
      container_pressure_cpu_stalled_seconds_total:      267
      container_pressure_memory_waiting_seconds_total:      267
      container_pressure_memory_stalled_seconds_total:      267
      container_pressure_io_waiting_seconds_total:      267
      container_pressure_io_stalled_seconds_total:      267
      === Node: ip-10-0-13-137.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:      905
      container_pressure_cpu_stalled_seconds_total:      905
      container_pressure_memory_waiting_seconds_total:      905
      container_pressure_memory_stalled_seconds_total:      905
      container_pressure_io_waiting_seconds_total:      905
      container_pressure_io_stalled_seconds_total:      905
      === Node: ip-10-0-33-56.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:      304
      container_pressure_cpu_stalled_seconds_total:      304
      container_pressure_memory_waiting_seconds_total:      304
      container_pressure_memory_stalled_seconds_total:      304
      container_pressure_io_waiting_seconds_total:      304
      container_pressure_io_stalled_seconds_total:      304
      === Node: ip-10-0-48-54.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:      891
      container_pressure_cpu_stalled_seconds_total:      891
      container_pressure_memory_waiting_seconds_total:      891
      container_pressure_memory_stalled_seconds_total:      891
      container_pressure_io_waiting_seconds_total:      891
      container_pressure_io_stalled_seconds_total:      891
      === Node: ip-10-0-65-182.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:      869
      container_pressure_cpu_stalled_seconds_total:      869
      container_pressure_memory_waiting_seconds_total:      869
      container_pressure_memory_stalled_seconds_total:      869
      container_pressure_io_waiting_seconds_total:      869
      container_pressure_io_stalled_seconds_total:      869
      === Node: ip-10-0-71-46.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:      399
      container_pressure_cpu_stalled_seconds_total:      399
      container_pressure_memory_waiting_seconds_total:      399
      container_pressure_memory_stalled_seconds_total:      399
      container_pressure_io_waiting_seconds_total:      399
      container_pressure_io_stalled_seconds_total:      399
      The PSI metrics data has value as '0'
      
      # kubectl get --raw "/api/v1/nodes/ip-10-0-13-137.us-east-2.compute.internal/proxy/metrics/cadvisor" | grep container_pressure  | head -n 30
      # HELP container_pressure_cpu_stalled_seconds_total Total time duration no tasks in the container could make progress due to CPU congestion.
      # TYPE container_pressure_cpu_stalled_seconds_total counter
      container_pressure_cpu_stalled_seconds_total{container="",id="/",image="",name="",namespace="",pod=""} 0 1768545538199
      container_pressure_cpu_stalled_seconds_total{container="",id="/kubepods.slice",image="",name="",namespace="",pod=""} 0 1768545538366
      container_pressure_cpu_stalled_seconds_total{container="",id="/kubepods.slice/kubepods-besteffort.slice",image="",name="",namespace="",pod=""} 0 1768545520212
      container_pressure_cpu_stalled_seconds_total{container="",id="/kubepods.slice/kubepods-burstable.slice",image="",name="",namespace="",pod=""} 0 1768545530055
      container_pressure_cpu_stalled_seconds_total{container="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod043ba9f7_c8fb_4904_ad7f_3998eaf99f0e.slice",image="",name="",namespace="node-density-heavy-0",pod="postgres-1-143-6b4b94b49c-w92mf"} 0 1768545533551
      container_pressure_cpu_stalled_seconds_total{container="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod043ba9f7_c8fb_4904_ad7f_3998eaf99f0e.slice/crio-conmon-b4571fa14d33283cae4176fb1c6e1aa1462ca0f3ddfb6ad0b3ac62c5b141f9fd.scope",image="",name="",namespace="",pod=""} 0 1768545530913
      container_pressure_cpu_stalled_seconds_total{container="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod04e9b0eb_5b33_47b2_99a7_5426ce509a92.slice",image="",name="",namespace="openshift-multus",pod="multus-p8k5c"} 0 1768545539719 

      Prometheus can query the PSI metrics but values are 0. psi-not-enabled-prometheus-4.21.png

      Cluster with PSI enabled, PSI metrics values are not 0. psi-enabled-prometheus-4.21.png

      Expected results:

      The PSI metrics should not be queried from /metrics/cadvisor endpoint before enabling PSI. Prometheus should not be able to query them.

      Additional info:

      Tough the PSI metrics can be queries. The values of them are all 0. 
      If I enable PSI on the cluster. The values are real value instead of 0.
      
      Confirm PSI is not enabled on 4.21
      
      # oc debug node/ip-10-0-13-137.us-east-2.compute.internal
      ....
      chroot /host
      sh-5.1# ls /proc/pressure
      ls: cannot access '/proc/pressure': No such file or directory
      
      4.20 does not have this issue
      
      # for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do echo "=== Node: $node ==="; for metric in cpu_waiting cpu_stalled memory_waiting memory_stalled io_waiting io_stalled; do echo -n "container_pressure_${metric}_seconds_total: "; kubectl get --raw "/api/v1/nodes/$node/proxy/metrics/cadvisor" | grep "container_pressure_${metric}_seconds_total" | wc -l; done; done
      === Node: ip-10-0-19-27.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:        0
      container_pressure_cpu_stalled_seconds_total:        0
      container_pressure_memory_waiting_seconds_total:        0
      container_pressure_memory_stalled_seconds_total:        0
      container_pressure_io_waiting_seconds_total:        0
      container_pressure_io_stalled_seconds_total:        0
      === Node: ip-10-0-26-149.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:        0
      container_pressure_cpu_stalled_seconds_total:        0
      container_pressure_memory_waiting_seconds_total:        0
      container_pressure_memory_stalled_seconds_total:        0
      container_pressure_io_waiting_seconds_total:        0
      container_pressure_io_stalled_seconds_total:        0
      === Node: ip-10-0-58-84.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:        0
      container_pressure_cpu_stalled_seconds_total:        0
      container_pressure_memory_waiting_seconds_total:        0
      container_pressure_memory_stalled_seconds_total:        0
      container_pressure_io_waiting_seconds_total:        0
      container_pressure_io_stalled_seconds_total:        0
      === Node: ip-10-0-61-228.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:        0
      container_pressure_cpu_stalled_seconds_total:        0
      container_pressure_memory_waiting_seconds_total:        0
      container_pressure_memory_stalled_seconds_total:        0
      container_pressure_io_waiting_seconds_total:        0
      container_pressure_io_stalled_seconds_total:        0
      === Node: ip-10-0-86-191.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:        0
      container_pressure_cpu_stalled_seconds_total:        0
      container_pressure_memory_waiting_seconds_total:        0
      container_pressure_memory_stalled_seconds_total:        0
      container_pressure_io_waiting_seconds_total:        0
      container_pressure_io_stalled_seconds_total:        0
      === Node: ip-10-0-91-54.us-east-2.compute.internal ===
      container_pressure_cpu_waiting_seconds_total:        0
      container_pressure_cpu_stalled_seconds_total:        0
      container_pressure_memory_waiting_seconds_total:        0
      container_pressure_memory_stalled_seconds_total:        0
      container_pressure_io_waiting_seconds_total:        0
      container_pressure_io_stalled_seconds_total:        0
      # oc version
      Client Version: 4.21.0-rc.0
      Kustomize Version: v5.7.1
      Server Version: 4.20.0-0.nightly-2026-01-13-225320
      Kubernetes Version: v1.33.6

      PSI metrics

              rh-ee-ngopalak Neeraj Krishna Gopalakrishna
              rhn-support-qili Qiujie Li
              Aruna Naik, Neelesh Agrawal, Neeraj Krishna Gopalakrishna
              None
              Min Li Min Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: