Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-69655

[azure disk csi driver] static provision crashed caused by segmentation fault on Azure stack hub clusters

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.21.0, 4.22.0
    • Storage / Operators
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Rejected
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-69390. The following is the description of the original issue:

      Description of problem:

      [azure disk csi driver] static provision crashed caused by segmentation fault on Azure stack hub clusters    

      Version-Release number of selected component (if applicable):

      4.22.0-0-2025-12-16-071206-test-ci-ln-6s71ts2-latest    

      How reproducible:

      Always    

      Steps to Reproduce:

          1. Create an openshift cluster on azuer stack hub.
          2. Manual create a disk by azure command line.
          3. Use the manualy create disk create pvc and pv comsume by pod.
      oc apply -f - <<EOF
      apiVersion: v1
      kind: PersistentVolume
      metadata:
        annotations:
          pv.kubernetes.io/provisioned-by: disk.csi.azure.com
        name: pv-azuredisk-static
      spec:
        capacity:
          storage: 10Gi
        accessModes:
          - ReadWriteOnce
        persistentVolumeReclaimPolicy: Retain
        storageClassName: ""  # Important: empty string for static provisioning
        csi:
          driver: disk.csi.azure.com
          # Replace with your actual disk resource ID
          volumeHandle: /subscriptions/XXXXX/resourceGroups/XXXXXX/providers/Microsoft.Compute/disks/pvc-fc003bdf-cea2-49ea-8df7-c3941dfc6a6a
          volumeAttributes:
            fsType: ext4
      ---
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: pvc-azuredisk-static
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        volumeName: pv-azuredisk-static  # Binds to the specific PV
        storageClassName: "" 
      --- 
      apiVersion: v1
      kind: Pod
      metadata:
        name: my-app
      spec:
        containers:
        - name: my-app
          image: quay.io/openshifttest/hello-openshift@sha256:56c354e7885051b6bb4263f9faa58b2c292d44790599b7dde0e49e7c466cf339
          volumeMounts:
          - name: azuredisk
            mountPath: /mnt/adata
        volumes:
        - name: azuredisk
          persistentVolumeClaim:
            claimName: pvc-azuredisk-static
      EOF  
        4. Check the static provisioned volume could be read and write by workload.   

      Actual results:

      In step 4 the pod stuck at ContainerCreating, ControllerPublishVolume failed of "panic: runtime error: invalid memory address or nil pointer dereference"
      $ oc describe po/my-app
      ...
      Events:
        Type     Reason              Age                 From                     Message
        ----     ------              ----                ----                     -------
        Normal   Scheduled           11m                 default-scheduler        Successfully assigned default/my-app to pewang-1216ash-cb9jm-worker-mtcazs-5b5n8
        Warning  FailedAttachVolume  62s (x13 over 11m)  attachdetach-controller  AttachVolume.Attach failed for volume "pv-azuredisk-static" : rpc error: code = Unavailable desc = error reading from server: EOF    
      # The driver controller crashed
      $ oc get po
      NAME                                                READY   STATUS             RESTARTS         AGE
      azure-disk-csi-driver-controller-64df65c85f-ptkjc   9/11    CrashLoopBackOff   10 (2m8s ago)    60m
      azure-disk-csi-driver-controller-64df65c85f-w42nv   11/11   Running            14 (8m21s ago)   58m
      azure-disk-csi-driver-node-dsdn6                    4/4     Running            0                51m
      azure-disk-csi-driver-node-gk448                    4/4     Running            0                59m
      azure-disk-csi-driver-node-jt2rd                    4/4     Running            0                51m
      azure-disk-csi-driver-node-nn2s7                    4/4     Running            0                51m
      azure-disk-csi-driver-node-pvvbj                    4/4     Running            0                59m
      azure-disk-csi-driver-node-vkndd                    4/4     Running            0                59m
      azure-disk-csi-driver-operator-b84688c66-lv6q9      1/1     Running            0                60m
      
      $ oc logs azure-disk-csi-driver-controller-64df65c85f-w42nvDefaulted container "csi-driver" out of: csi-driver, kube-rbac-proxy-8201, csi-provisioner, provisioner-kube-rbac-proxy, csi-attacher, attacher-kube-rbac-proxy, csi-resizer, resizer-kube-rbac-proxy, csi-snapshotter, snapshotter-kube-rbac-proxy, csi-liveness-probe, azure-inject-credentials (init)I1216 08:30:44.550891       1 main.go:112] set up prometheus server on 127.0.0.1:8201I1216 08:30:44.551068       1 main.go:87] Sys info: NumCPU: 8 MAXPROC: 2W1216 08:30:44.551093       1 azuredisk.go:209] nodeid is emptyI1216 08:30:44.551106       1 azuredisk.go:230] driver userAgent: disk.csi.azure.com/v1.33.5W1216 08:30:44.551123       1 client_config.go:667] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.I1216 08:30:44.551854       1 azure_disk_utils.go:155] reading cloud config from secret ""/""I1216 08:30:44.575673       1 azure_disk_utils.go:165] InitializeCloudFromSecret: failed to get cloud config from secret ""/"": secrets "\"\"" not foundI1216 08:30:44.575694       1 azure_disk_utils.go:174] could not read cloud config from secret ""/""I1216 08:30:44.575704       1 azure_disk_utils.go:177] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/cloud.confI1216 08:30:44.650511       1 azure.go:613] Azure cloudprovider using try backoff: retries=6, exponent=1.500000, duration=6, jitter=1.000000I1216 08:30:44.701115       1 azure.go:421] "Setting up ARM client factory for network resources" subscriptionID="de7e09c3-b59a-4c7d-9c77-439c11b92879"I1216 08:30:44.748657       1 azure.go:429] "Setting up ARM client factory for compute resources" subscriptionID="de7e09c3-b59a-4c7d-9c77-439c11b92879"I1216 08:30:44.748704       1 azuredisk.go:255] disable UseInstanceMetadata for controllerI1216 08:30:44.748714       1 azuredisk.go:267] cloud: AzureStackCloud, location: mtcazs, rg: pewang-1216ash-cb9jm-rg, VMType: standard, PrimaryScaleSetName: , PrimaryAvailabilitySetName: , DisableAvailabilitySetNodes: falseI1216 08:30:44.748724       1 azuredisk.go:270] vmssCacheTTLInSeconds: -1, listVMSSWithInstanceView: falseI1216 08:30:44.748753       1 azuredisk.go:286] set DetachOperationMinTimeoutInSeconds as 240I1216 08:30:44.749110       1 driver.go:82] Enabling controller service capability: CREATE_DELETE_VOLUMEI1216 08:30:44.749121       1 driver.go:82] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUMEI1216 08:30:44.749126       1 driver.go:82] Enabling controller service capability: CREATE_DELETE_SNAPSHOTI1216 08:30:44.749131       1 driver.go:82] Enabling controller service capability: CLONE_VOLUMEI1216 08:30:44.749136       1 driver.go:82] Enabling controller service capability: EXPAND_VOLUMEI1216 08:30:44.749141       1 driver.go:82] Enabling controller service capability: SINGLE_NODE_MULTI_WRITERI1216 08:30:44.749154       1 driver.go:82] Enabling controller service capability: MODIFY_VOLUMEI1216 08:30:44.749162       1 driver.go:101] Enabling volume access mode: SINGLE_NODE_WRITERI1216 08:30:44.749166       1 driver.go:101] Enabling volume access mode: SINGLE_NODE_READER_ONLYI1216 08:30:44.749169       1 driver.go:101] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITERI1216 08:30:44.749172       1 driver.go:101] Enabling volume access mode: SINGLE_NODE_MULTI_WRITERI1216 08:30:44.749178       1 driver.go:92] Enabling node service capability: STAGE_UNSTAGE_VOLUMEI1216 08:30:44.749181       1 driver.go:92] Enabling node service capability: EXPAND_VOLUMEI1216 08:30:44.749184       1 driver.go:92] Enabling node service capability: GET_VOLUME_STATSI1216 08:30:44.749205       1 driver.go:92] Enabling node service capability: SINGLE_NODE_MULTI_WRITERI1216 08:30:44.749366       1 azuredisk.go:375]DRIVER INFORMATION:-------------------Build Date: "2025-12-12T10:56:01Z"Compiler: gcDriver Name: disk.csi.azure.comDriver Version: v1.33.5Git Commit: 05b135bf6972b8ac85f9b742008148c78e3179d3Go Version: go1.24.6 (Red Hat 1.24.6-1.el9_6) X:strictfipsruntimePlatform: linux/amd64Topology Key: topology.disk.csi.azure.com/zone
      Streaming logs below:I1216 08:30:45.637930       1 utils.go:105] GRPC call: /csi.v1.Identity/GetPluginInfoI1216 08:30:45.637947       1 utils.go:106] GRPC request: {}I1216 08:30:45.638864       1 utils.go:112] GRPC response: {"name":"disk.csi.azure.com","vendor_version":"v1.33.5"}I1216 08:30:45.641098       1 utils.go:105] GRPC call: /csi.v1.Identity/GetPluginCapabilitiesI1216 08:30:45.641115       1 utils.go:106] GRPC request: {}I1216 08:30:45.641147       1 utils.go:112] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}},{"Type":{"VolumeExpansion":{"type":2}}},{"Type":{"VolumeExpansion":{"type":1}}}]}I1216 08:30:45.642176       1 utils.go:105] GRPC call: /csi.v1.Controller/ControllerGetCapabilitiesI1216 08:30:45.642191       1 utils.go:106] GRPC request: {}I1216 08:30:45.642220       1 utils.go:112] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":13}}},{"Type":{"Rpc":{"type":14}}}]}I1216 08:30:45.761178       1 utils.go:105] GRPC call: /csi.v1.Controller/ControllerPublishVolumeI1216 08:30:45.761211       1 utils.go:106] GRPC request: {"node_id":"pewang-1216ash-cb9jm-worker-mtcazs-5b5n8","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":7}},"volume_context":{"fsType":"ext4"},"volume_id":"/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/pewang-1216ash-cb9jm-rg/providers/Microsoft.Compute/disks/pvc-fc003bdf-cea2-49ea-8df7-c3941dfc6a6a"}I1216 08:30:46.338387       1 azure_controller_common.go:592] azureDisk - found disk: lun 0 name pvc-fc003bdf-cea2-49ea-8df7-c3941dfc6a6a uri /subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/pewang-1216ash-cb9jm-rg/providers/Microsoft.Compute/disks/pvc-fc003bdf-cea2-49ea-8df7-c3941dfc6a6aI1216 08:30:46.338417       1 controllerserver.go:646] GetDiskLun returned: <nil>. Initiating attaching volume /subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/pewang-1216ash-cb9jm-rg/providers/Microsoft.Compute/disks/pvc-fc003bdf-cea2-49ea-8df7-c3941dfc6a6a to node pewang-1216ash-cb9jm-worker-mtcazs-5b5n8 (vmState Succeeded).I1216 08:30:46.338429       1 controllerserver.go:661] Attach operation is successful. volume /subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/pewang-1216ash-cb9jm-rg/providers/Microsoft.Compute/disks/pvc-fc003bdf-cea2-49ea-8df7-c3941dfc6a6a is already attached to node pewang-1216ash-cb9jm-worker-mtcazs-5b5n8 at lun 0.panic: runtime error: invalid memory address or nil pointer dereference[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x22549a6]
      goroutine 182 [running]:sigs.k8s.io/azuredisk-csi-driver/pkg/azureutils.InsertDiskProperties(0xc0007b9f00?, 0xc0006456e0)	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azureutils/azure_disk_utils.go:751 +0xc6sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).ControllerPublishVolume(0xc0002f6fc8, {0x30698b0, 0xc0007ebf20}, 0xc0002f0700)	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azuredisk/controllerserver.go:712 +0x20fagithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerPublishVolume_Handler.func1({0x30698b0?, 0xc0007ebf20?}, {0x2a7cb40?, 0xc0002f0700?})	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi_grpc.pb.go:487 +0xcbsigs.k8s.io/azuredisk-csi-driver/pkg/csi-common.LogGRPC({0x30698b0, 0xc0007ebf20}, {0x2a7cb40, 0xc0002f0700}, 0xc000588700, 0xc000737b30)	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/csi-common/utils.go:108 +0x409google.golang.org/grpc.getChainUnaryHandler.func1({0x30698b0?, 0xc0007ebf20?}, {0x2a7cb40?, 0xc0002f0700?})	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1217 +0x153sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).Run.(*ServerMetrics).UnaryServerInterceptor.UnaryServerInterceptor.func5({0x30698b0, 0xc0007ebf20}, {0x2a7cb40, 0xc0002f0700}, 0xc000588700?, 0xc0007ceec0)	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/server.go:22 +0x275google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1({0x30698b0, 0xc0007ebf20}, {0x2a7cb40, 0xc0002f0700}, 0xc000588700, 0x80?)	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1208 +0x7cgithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerPublishVolume_Handler({0x2c2a980, 0xc0002f6fc8}, {0x30698b0, 0xc0007ebf20}, 0xc0007b9600, 0xc00031ef60)	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi_grpc.pb.go:489 +0x143google.golang.org/grpc.(*Server).processUnaryRPC(0xc0000e6400, {0x30698b0, 0xc0007ebe90}, 0xc0007bac00, 0xc0006ee4b0, 0x489a850, 0x0)	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1405 +0x1036google.golang.org/grpc.(*Server).handleStream(0xc0000e6400, {0x306ab98, 0xc0001d5380}, 0xc0007bac00)	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1815 +0xb88google.golang.org/grpc.(*Server).serveStreams.func2.1()	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1035 +0x7fcreated by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 179	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/google.golang.org/grpc/server.go:1046 +0x11d

      Expected results:

        In step 4 the pod should be Running and could read and write data to the static provision volume.  

      Additional info:

          

              rhn-support-pewang Penghao Wang
              rhn-support-pewang Penghao Wang
              None
              None
              Wei Duan Wei Duan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: