Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-3637

Upgrade from FDF 4.16.11 - FDF 4.17.9 has CephCluster in error state "failed to start ceph monitors"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • odf-4.17.11
    • odf-4.17.9
    • ocs-client-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Committed
    • ?
    • ?
    • 4.17.11-1.konflux
    • Committed
    • Important
    • None

       

      Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

       
      While trying an upgrade from FDF 4.16 - FDF 4.17.9, ran into a problem where CephCluster has this error:

      failed to create cluster: failed to start ceph monitors: failed to set Rook and/or user-defined Ceph config options after forcefully updating the existing mons: failed to apply default Ceph configurations: failed to set all keys: failed to set ceph config in the centralized mon configuration database; output: Error EACCES: access denied: exit status 13

      CSV and Subs are upgraded. Only storage health is degraded.

      [root@f4wx-bastion F210Upgrade]# oc get cephblockpools.ceph.rook.io NAME PHASE TYPE FAILUREDOMAIN AGE builtin-mgr Failure Replicated host 130d ocs-storagecluster-cephblockpool Failure Replicated host 130d
      [root@f4wx-bastion F210Upgrade]# oc get cephclients
      NAME                PHASE   AGE
      141516262f5c7403ac78a0ab07f361dc  Failure  130d
      5af838786b947521c8a3a3f05a7e9a53  Failure  130d
      776ce9b395246754e0cb49ce14c07da8  Failure  130d
      d25839652fe464684e808bddf48b85d4  Failure  130d
      d6cc0327f645b69fae774be2546e1e68  Failure  130d
      fafd1bb5216a19c675db11660e41fb63  Failure  130d
      fc012792d38fed55e3c0d13d2071dcbb  Failure  130d

      The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

      IBM Fusion HCI

      The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

      Provider

       

      The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

      OCP - 4.17.35

      FDF - 4.17.9

       

      Does this issue impact your ability to continue to work with the product?

       

       

      Is there any workaround available to the best of your knowledge?

       

       

      Can this issue be reproduced? If so, please provide the hit rate

       

       

      Can this issue be reproduced from the UI?

       

      If this is a regression, please provide more details to justify this:

       

      Steps to Reproduce:

      1. Install and Configure FDF 4.16.11. 

      2. Upgrade OCP to 4.17.

      3. Upgrade DF to 4.17.9.

       

      The exact date and time when the issue was observed, including timezone details:

      17th July, 2025

      Actual results:

      configuration post upgrade is in failed state.

       

      Expected results:

      upgrade to 4.17.9 should complete successfully.

      Logs collected and log location:

       

      Additional info:

       
       

        1. must-gather-part-aa
          190.00 MB
          Namita Singroha

              rhn-support-lgangava Leela Gangavarapu
              namita_si Namita Singroha (Inactive)
              Coady LaCroix Coady LaCroix
              Votes:
              0 Vote for this issue
              Watchers:
              27 Start watching this issue

                Created:
                Updated:
                Resolved: