Uploaded image for project: 'FlightPath'
  1. FlightPath
  2. FLPATH-2819

Kafka deployment fails on OpenShift due to incorrect storage class auto-detection (OCS vs ODF)

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      Description

      The deploy-strimzi.sh script fails to deploy Kafka cluster on OpenShift, causing all Zookeeper and Kafka broker pods to remain in Pending state indefinitely. The root cause is incorrect storage class auto-detection that assumes ODF storage class naming (odf-storagecluster-ceph-rbd) when the cluster actually uses OCS naming (ocs-storagecluster-ceph-rbd).

      Root Cause

      At scripts/deploy-strimzi.sh:70, the script hardcodes the storage class assumption for OpenShift:
      STORAGE_CLASS="odf-storagecluster-ceph-rbd"

      However, many OpenShift clusters still use the older OCS (OpenShift Container Storage) naming convention instead of the newer ODF (OpenShift Data Foundation) naming.

      Impact

      • Severity: Major
      • All 3 Zookeeper pods stuck in Pending state
      • All 3 Kafka broker pods cannot start
      • All 6 PVCs remain in Pending state with error: storageclass.storage.k8s.io "odf-storagecluster-ceph-rbd" not found
      • Complete deployment failure requiring manual intervention
      • Affects initial deployment on OpenShift clusters using OCS storage

      Workaround

      Set the correct storage class explicitly when running the script:
      STORAGE_CLASS=ocs-storagecluster-ceph-rbd ./scripts/deploy-strimzi.sh

      Permanent Fix

      Update scripts/deploy-strimzi.sh:70 to include proper storage class auto-detection that checks for both OCS and ODF naming conventions before falling back to generic detection.

      Steps to Reproduce

        1. Deploy on OpenShift cluster with OCS storage (using ocs-storagecluster-ceph-rbd storage class)
          2. Run ./scripts/deploy-strimzi.sh
          3. Wait for timeout (deployment hangs for 9+ minutes)
          4. Observe error: "Timeout waiting for Kafka cluster to be ready"

      Error Logs

      PVC Status shows Pending with storageclass not found error
      PVC Events: Warning ProvisioningFailed storageclass.storage.k8s.io "odf-storagecluster-ceph-rbd" not found
      Pod Status: ros-ocp-kafka-zookeeper-0 in Pending state indefinitely

      Version Information

      • Repository: ros-helm-chart
      • Git Commit: f5806af
      • OpenShift Version: 4.18.24
      • Script: scripts/deploy-strimzi.sh
      • Storage Class Found: ocs-storagecluster-ceph-rbd
      • Storage Class Expected by Script: odf-storagecluster-ceph-rbd

              gciavarrini@redhat.com Gloria Ciavarrini
              chadcrum Chad Crum
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: