-
Bug
-
Resolution: Done
-
Undefined
-
4.15
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
In Progress
-
Bug Fix
-
-
None
-
None
-
None
-
None
This is a clone of issue OCPBUGS-60464. The following is the description of the original issue:
—
Description of problem:
The AzureDiskCSIDriverOperator is in a degraded state. The logs for the operator pod show a panic with the message "assignment to entry in nil map" and an error message of "AzureDiskDriverStaticResourcesControllerDegraded: "csidriver.yaml" (string): rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout". This degraded state indicates the operator is failing to reconcile its static resources, specifically the `CSIDriver` for `disk.csi.azure.com`, which could lead to issues during future cluster upgrades.
Version-Release number of selected component (if applicable):
ARO v4.15.49
How reproducible:
Not sure, because the root cause of the issue is not clear.
Steps to Reproduce:
1. 2. 3.
Actual results:
The operator entered a degraded state on June 1, 2025, and restarting the CSI operator pods did not resolve the panics or the degraded status. Deleting the `clustercsidrivers` did work.
Expected results:
The `AzureDiskCSIDriverOperator` should be in a healthy, non-degraded state, and there should be no panics in the operator pod logs. It should be able to successfully reconcile all its static resources, including the `csidriver.yaml` manifest.
Additional info:
- The degradation has been observed since June 1, 2025. - The panics were observed around 09:49 UTC on the same day. - There are no relevant entries in the audit logs immediately preceding the degradation. - No apparent performance issues were noted. - The issue persists regardless of the master node the operator pod is running on. - A workaround attempt of restarting the CSI operator pods was unsuccessful as panics were still observed afterwards. - The two symptoms (panic and degradation) may or may not be directly linked. - This issue is similar to OCPBUGS-57395 but we are not sure the root cause is the same. The workaround to delete the `clustercsidrivers` object to let the operator reconcile it works: `$ oc delete clustercsidriver disk.csi.azure.com`.