Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Blocker
Fix Version/s: None
Affects Version/s: None
Component/s: Storage Ecosystem
Labels:
None

Activity Type:
Product / Portfolio Work
Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
CNV-69953
Component Fix Version(s):
None
Market:

Severity:
Critical

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

c3-baremetal instances are limited to 15 volume attachments per node.  this limitation is hard-coded in the gcp-pd driver code itself:https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L128

need to explore a way to override this by adding a label to the node:node-restriction.kubernetes.io/gke-volume-attach-limit-override=XXXhttps://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L926-L937

Version-Release number of selected component (if applicable):

4.21.1

How reproducible:

Always

Steps to Reproduce:

1. create a pod with 75 pvcs or 75 pods with 1 pvc each
2.
3.

Actual results:

the pod stuck in pending (or only 15 pods created, pod 16 stuck on pending)

Expected results:

pod in running state (or 75 pods in running state)

Additional info:

it is possible to bypass the 15 volume per node limitation by adding:
node-restriction.kubernetes.io/gke-volume-attach-limit-override=XXX
label in the node to override the limitation (up to 127)

this revlead two issues:

1. missing RBAC (solved, see OCPBUGS-77183 )

2.
gcp-pd csi driver is reading the node name as:
test-gcp10-wf7zf-worker-c-6kgqn
instead of:
test-gcp10-wf7zf-worker-c-6kgqn.c.ocpstrat-1278.internal

and shows

I0225 21:46:52.147188       1 utils.go:82] /csi.v1.Node/NodeGetInfo called with request: 
W0225 21:46:52.174054       1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying...
W0225 21:46:53.185924       1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying...
W0225 21:46:55.194148       1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying...
W0225 21:46:59.211100       1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying...
W0225 21:47:07.220454       1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying...
E0225 21:47:07.220471       1 node.go:46] Failed to get node test-gcp10-wf7zf-worker-c-6kgqn after retries: timed out waiting for the condition
W0225 21:47:07.220479       1 node.go:871] using default value due to err getting node-restriction.kubernetes.io/gke-volume-attach-limit-override: timed out waiting for the condition

this is raising the following questions:
does this FQDN naming istake into effect?

preventing the attachment limitiation override to take effect

is blocked by

OCPBUGS-77716 [release-4.21] [GCP-PD] csi driver is using the wrong node name when configuring attachment limit

Verified

is related to

OCPBUGS-77602 [GCP-PD] csi driver is using the wrong node name when configuring attachment limit

Verified

Assignee:: Noam Assouline

Reporter:: Noam Assouline

QA Contact:: Ahmad Hafi

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/02/23 11:05 PM

Updated:: 2026/03/04 12:01 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates