-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.14.z
-
None
-
Low
-
None
-
False
-
Description of problem:
The labels are being automatically injected by the kubelet resource.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Created three separate machine sets, each corresponding to a specific zone: nova, nova-az2, and nova-az3. 2. Recreated the worker nodes, ensuring an even distribution across the machine sets. 3. We confirmed that node affinity was applied to the existing volumes, as shown below: ===== nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: topology.cinder.csi.openstack.org/zone operator: In values: - nova - matchExpressions: - key: topology.cinder.csi.openstack.org/zone operator: In values: - nova-az3 ===== 4. If a pod is scheduled on a worker node within the nova-az2 availability zone, the following error occurs: ==== 0/11 nodes are available: 1 Insufficient cpu, 1 node(s) had volume node affinity conflict, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 7 Insufficient memory. preemption: 0/11 nodes are available: 4 Preemption is not helpful for scheduling, 7 No preemption victims found for incoming pod.. ==== 5. To resolve this issue, we needed to update the node label to match the node affinity of the volume, changing: ====== labels: topology.cinder.csi.openstack.org/zone: nova-az2 <-- Change to nova or nova-az3 ====== 6. Once this change was made, the pod transitioned to a running state, and the volume was successfully attached. However, the node-registrar container in the "openstack-cinder-csi-driver-node-" pod then entered an error state with the following message: ====== plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "topology.cinder.csi.openstack.org/zone":"nova-az2" but existing label is "topology.cinder.csi.openstack.org/zone":"nova". ====== 7. To address this, we had to revert the node label to its previous state. 8. The issue persists as pods will remain in the running state until they are rescheduled onto a different worker node from the nova-az2 zone, or until the nodes are rebooted during a cluster upgrade or crash recovery.
Actual results:
Expected results:
Additional info: