Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Cloud Compute / OpenStack Provider
Labels:
- Triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Low
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
ShiftStack Sprint 264, ShiftStack Sprint 265, ShiftStack Sprint 266, ShiftStack Sprint 267, ShiftStack Sprint 269, ShiftStack Sprint 270, ShiftStack Sprint 271, ShiftStack Sprint 272, ShiftStack Sprint 273, ShiftStack Sprint 274, ShiftStack Sprint 275, ShiftStack Sprint 277, ShiftStack Sprint 278
sprint_count:
13

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Test Coverage:

-

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The labels are being automatically injected by the kubelet resource.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Created three separate machine sets, each corresponding to a specific zone: nova, nova-az2, and nova-az3.

2. Recreated the worker nodes, ensuring an even distribution across the machine sets.

3. We confirmed that node affinity was applied to the existing volumes, as shown below:
=====
nodeAffinity:
  required:
    nodeSelectorTerms:
      - matchExpressions:
          - key: topology.cinder.csi.openstack.org/zone
            operator: In
            values:
              - nova
      - matchExpressions:
          - key: topology.cinder.csi.openstack.org/zone
            operator: In
            values:
              - nova-az3
=====

4. If a pod is scheduled on a worker node within the nova-az2 availability zone, the following error occurs:
====
0/11 nodes are available: 1 Insufficient cpu, 1 node(s) had volume node affinity conflict, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 7 Insufficient memory. preemption: 0/11 nodes are available: 4 Preemption is not helpful for scheduling, 7 No preemption victims found for incoming pod..
====

5. To resolve this issue, we needed to update the node label to match the node affinity of the volume, changing:
======
labels:
  topology.cinder.csi.openstack.org/zone: nova-az2   <-- Change to nova or nova-az3
======

6. Once this change was made, the pod transitioned to a running state, and the volume was successfully attached. However, the node-registrar container in the "openstack-cinder-csi-driver-node-" pod then entered an error state with the following message:
======
plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "topology.cinder.csi.openstack.org/zone":"nova-az2" but existing label is "topology.cinder.csi.openstack.org/zone":"nova".
======

7. To address this, we had to revert the node label to its previous state.

8. The issue persists as pods will remain in the running state until they are rescheduled onto a different worker node from the nova-az2 zone, or until the nodes are rebooted during a cluster upgrade or crash recovery.

Actual results:

Expected results:

Additional info:

Assignee:: Stephen Finucane

Reporter:: Shubh Kohli

Need Info From:: None

Contributors:: None

QA Contact:: Sunil Choudhary

Doc Contact:: None

Votes:: 1 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2024/11/19 1:10 PM

Updated:: 2025/10/10 8:07 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide