Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44740

Labels are being automatically injected by the kubelet resource

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • None
    • ShiftStack Sprint 264, ShiftStack Sprint 265, ShiftStack Sprint 266, ShiftStack Sprint 267, ShiftStack Sprint 269, ShiftStack Sprint 270, ShiftStack Sprint 271, ShiftStack Sprint 272, ShiftStack Sprint 273, ShiftStack Sprint 274, ShiftStack Sprint 275, ShiftStack Sprint 277, ShiftStack Sprint 278
    • 13
    • -
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The labels are being automatically injected by the kubelet resource.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

      1. Created three separate machine sets, each corresponding to a specific zone: nova, nova-az2, and nova-az3.
      
      2. Recreated the worker nodes, ensuring an even distribution across the machine sets.
      
      3. We confirmed that node affinity was applied to the existing volumes, as shown below:
      =====
      nodeAffinity:
        required:
          nodeSelectorTerms:
            - matchExpressions:
                - key: topology.cinder.csi.openstack.org/zone
                  operator: In
                  values:
                    - nova
            - matchExpressions:
                - key: topology.cinder.csi.openstack.org/zone
                  operator: In
                  values:
                    - nova-az3
      =====
      
      4. If a pod is scheduled on a worker node within the nova-az2 availability zone, the following error occurs:
      ====
      0/11 nodes are available: 1 Insufficient cpu, 1 node(s) had volume node affinity conflict, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 7 Insufficient memory. preemption: 0/11 nodes are available: 4 Preemption is not helpful for scheduling, 7 No preemption victims found for incoming pod..
      ====
      
      5. To resolve this issue, we needed to update the node label to match the node affinity of the volume, changing:
      ======
      labels:
        topology.cinder.csi.openstack.org/zone: nova-az2   <-- Change to nova or nova-az3
      ======
      
      6. Once this change was made, the pod transitioned to a running state, and the volume was successfully attached. However, the node-registrar container in the "openstack-cinder-csi-driver-node-" pod then entered an error state with the following message:
      ======
      plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "topology.cinder.csi.openstack.org/zone":"nova-az2" but existing label is "topology.cinder.csi.openstack.org/zone":"nova".
      ======
      
      7. To address this, we had to revert the node label to its previous state.
      
      8. The issue persists as pods will remain in the running state until they are rescheduled onto a different worker node from the nova-az2 zone, or until the nodes are rebooted during a cluster upgrade or crash recovery.     

      Actual results:

          

      Expected results:

          

      Additional info:

          

              sfinucan@redhat.com Stephen Finucane
              rhn-support-skohli Shubh Kohli
              None
              None
              Sunil Choudhary Sunil Choudhary
              None
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: