-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
None
-
None
Description of problem:
c3-baremetal instances are limited to 15 volume attachments per node. this limitation is hard-coded in the gcp-pd driver code itself:https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L128 need to explore a way to override this by adding a label to the node:node-restriction.kubernetes.io/gke-volume-attach-limit-override=XXXhttps://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/pkg/gce-pd-csi-driver/node.go#L926-L937
Version-Release number of selected component (if applicable):
4.21.1
How reproducible:
Always
Steps to Reproduce:
1. create a pod with 75 pvcs or 75 pods with 1 pvc each 2. 3.
Actual results:
the pod stuck in pending (or only 15 pods created, pod 16 stuck on pending)
Expected results:
pod in running state (or 75 pods in running state)
Additional info:
it is possible to bypass the 15 volume per node limitation by adding:
node-restriction.kubernetes.io/gke-volume-attach-limit-override=XXX
label in the node to override the limitation (up to 127)
this revlead two issues:
1. missing RBAC (solved, see OCPBUGS-77183 )
2.
gcp-pd csi driver is reading the node name as:
test-gcp10-wf7zf-worker-c-6kgqn
instead of:
test-gcp10-wf7zf-worker-c-6kgqn.c.ocpstrat-1278.internal
and shows
I0225 21:46:52.147188 1 utils.go:82] /csi.v1.Node/NodeGetInfo called with request: W0225 21:46:52.174054 1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying... W0225 21:46:53.185924 1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying... W0225 21:46:55.194148 1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying... W0225 21:46:59.211100 1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying... W0225 21:47:07.220454 1 node.go:37] Error getting node test-gcp10-wf7zf-worker-c-6kgqn: nodes "test-gcp10-wf7zf-worker-c-6kgqn" not found, retrying... E0225 21:47:07.220471 1 node.go:46] Failed to get node test-gcp10-wf7zf-worker-c-6kgqn after retries: timed out waiting for the condition W0225 21:47:07.220479 1 node.go:871] using default value due to err getting node-restriction.kubernetes.io/gke-volume-attach-limit-override: timed out waiting for the condition
this is raising the following questions:
does this FQDN naming istake into effect?
preventing the attachment limitiation override to take effect
- is blocked by
-
OCPBUGS-77716 [release-4.21] [GCP-PD] csi driver is using the wrong node name when configuring attachment limit
-
- Verified
-
- is related to
-
OCPBUGS-77602 [GCP-PD] csi driver is using the wrong node name when configuring attachment limit
-
- Verified
-