Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77664

du-vpp container crashes due to incorrect CPU allocation from shared CPU pool instead of dedicated isolated cores

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • aarch64
    • UAT
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      deployment running on OpenShift Container Platform (OCP) version 4.18.11, the du-vpp container crashes because it is scheduled onto the shared CPU pool (CPUs 0–3, 71) instead of receiving its designated isolated CPU cores (CPUs 4–6). 

       

      Cluster Details:

      • Cluster Version: 4.18.11
      • Desired Version: 4.18.11
      • CNI Plugin: OVNKubernetes
      • Network Type: OVNKubernetes
      • httpProxy: None
      • httpsProxy: None

      The hardware configuration and node CPU isolation settings have been verified and confirmed to be correct.

      Root Cause:
      This behavior is caused by a known CPU Manager race condition in Kubernetes (Upstream Issue #107906). When both the Init container (du-init) and the main container (du-vpp) request the same exclusive integer CPU value (e.g., cpu: "4"), a race condition can occur. After the Init container exits, Kubelet may incorrectly return those CPUs to the shared pool before the main container claims them.

      As a result, the main container is assigned CPUs from the shared pool rather than the isolated cores, leading to instability and container crashes

       

              pehunt@redhat.com Peter Hunt
              rhn-support-abbadak Abhishek Badak
              Niranjan Mallapadi Raghavendra Rao Niranjan Mallapadi Raghavendra Rao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: