-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.19
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
Yes
-
None
-
Approved
-
None
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
Description of problem:
Storage operator is degraded on 4.19 installs to azure stack:
storage 4.19.0-0.ci.test-2025-03-26-183004-ci-ln-q7lhwk2-latest False True False 76s AzureDiskCSIDriverOperatorCRAvailable: AzureDiskDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service
Pod logs show a permission error
[root@fedora auth]# oc logs azure-disk-csi-driver-node-6c44h -n openshift-cluster-csi-drivers csi-driver | tail -n 1 E0331 00:57:40.711688 1 utils.go:110] GRPC error: rpc error: code = Internal desc = getNodeInfoFromLabels on node(padillon03271650-4x84l-worker-mtcazs-r6584) failed with get node(padillon03271650-4x84l-worker-mtcazs-r6584) failed with nodes "padillon03271650-4x84l-worker-mtcazs-r6584" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "nodes" in API group "" at the cluster scope
I did not look into the source of this call to get nodes. Checking the rbac:
# oc describe clusterrole azure-disk-privileged-role Name: azure-disk-privileged-role Labels: <none> Annotations: <none> PolicyRule: Resources Non-Resource URLs Resource Names Verbs --------- ----------------- -------------- ----- securitycontextconstraints.security.openshift.io [] [privileged] [use]
Not sure what to make of this. Perhaps an upstream change? It could always be azure stack weirdness, more context below.
Version-Release number of selected component (if applicable):
4.19ec3
How reproducible:
Always
Steps to Reproduce:
All Azure Stack installs.
Actual results:
Degraded operator
Expected results:
Available
Additional info:
1. https://issues.redhat.com/browse/OCPBUGS-51090 tracks an upstream bug in cloud-provider-azure which I have an upstream wip fix for here: https://github.com/kubernetes-sigs/cloud-provider-azure/pull/8755. Upstream changed the sdk in the cloud provider to the v2 implementation which still has spotty at best support for azure stack. If the storage operator depends on node labels, this cloud provider bug could be the cause.
2.CI is down because new security measures were put in place for our environment. Manual token validation is now required. They are meeting on monday about enabling accessed from the fixed ip address we have given them.
Must gather attached
- depends on
-
OCPBUGS-55052 [CSI Operator] azure stack hub inject env config incorrect
-
- Closed
-
- is depended on by
-
OCPBUGS-55162 [azure disk csi driver] ControllerUnpublishVolume failed of "Could not find member 'toBeDetached' on object of type 'DataDisk'" on ASH
-
- Verified
-
- is duplicated by
-
OCPBUGS-55044 azure-disk-csi-driver-node-sa missing node read rbac causing crashloop on AzureStackHub
-
- Closed
-
- links to
-
RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update