Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60597

[4.17] AzureDiskCSIDriverOperator degraded with "Observed a panic: "assignment to entry in nil map""

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • In Progress
    • Bug Fix
    • Hide
      Before this update, the `AzureDiskCSIDriverOperator` entered a degraded state after its pod experienced a panic, specifically an "assignment to entry in nil map" error and an Remote Procedure Call (RPC) keepalive ping timeout. This failure prevented the Operator from reconciling its static resources, creating a significant risk of failures during future cluster upgrades. With this release, To resolve the issue, the `clustercsidriver` custom resource is deleted, forcing the Operator to recreate and reconcile the object, esolving the panics and ensuring the cluster's stability. (link:https://issues.redhat.com/browse/OCPBUGS-60597[OCPBUGS-60597])
      Show
      Before this update, the `AzureDiskCSIDriverOperator` entered a degraded state after its pod experienced a panic, specifically an "assignment to entry in nil map" error and an Remote Procedure Call (RPC) keepalive ping timeout. This failure prevented the Operator from reconciling its static resources, creating a significant risk of failures during future cluster upgrades. With this release, To resolve the issue, the `clustercsidriver` custom resource is deleted, forcing the Operator to recreate and reconcile the object, esolving the panics and ensuring the cluster's stability. (link: https://issues.redhat.com/browse/OCPBUGS-60597 [ OCPBUGS-60597 ])
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-60464. The following is the description of the original issue:

      Description of problem:

      The AzureDiskCSIDriverOperator is in a degraded state. The logs for the operator pod show a panic with the message "assignment to entry in nil map" and an error message of "AzureDiskDriverStaticResourcesControllerDegraded: "csidriver.yaml" (string): rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout". This degraded state indicates the operator is failing to reconcile its static resources, specifically the `CSIDriver` for `disk.csi.azure.com`, which could lead to issues during future cluster upgrades.

      Version-Release number of selected component (if applicable):

      ARO v4.15.49

      How reproducible:

      Not sure, because the root cause of the issue is not clear.

      Steps to Reproduce:

          1.
          2.
          3.

      Actual results:

      The operator entered a degraded state on June 1, 2025, and restarting the CSI operator pods did not resolve the panics or the degraded status. Deleting the `clustercsidrivers` did work.

      Expected results:

      The `AzureDiskCSIDriverOperator` should be in a healthy, non-degraded state, and there should be no panics in the operator pod logs. It should be able to successfully reconcile all its static resources, including the `csidriver.yaml` manifest.

      Additional info:

      - The degradation has been observed since June 1, 2025.
      - The panics were observed around 09:49 UTC on the same day.
      - There are no relevant entries in the audit logs immediately preceding the degradation.
      - No apparent performance issues were noted.
      - The issue persists regardless of the master node the operator pod is running on.
      - A workaround attempt of restarting the CSI operator pods was unsuccessful as panics were still observed afterwards.
      - The two symptoms (panic and degradation) may or may not be directly linked.
      - This issue is similar to OCPBUGS-57395 but we are not sure the root cause is the same. The workaround to delete the `clustercsidrivers` object to let the operator reconcile it works: `$ oc delete clustercsidriver disk.csi.azure.com`.

              hekumar@redhat.com Hemant Kumar
              rhn-support-ngareaga Natalia Garea Garcia
              None
              None
              Penghao Wang Penghao Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: