Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-946

[ROSA HCP] Fail to start ceph monitors. Storage deployment failed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Major Major
    • None
    • odf-4.16.3
    • rook
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • Important
    • None

      Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

       

      Deployment was performed with cli approach, via Subscription to latest stable ODF 4.16. 

       

      The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

      ROSA HCP 

      The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

      Internal (Hosted Control Planes)

       

      The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

      ODF: full_version: 4.16.3-2

      OCP: 4.17.4

      Does this issue impact your ability to continue to work with the product?

      yes, blocks to use product

       

      Is there any workaround available to the best of your knowledge?

      no

       

      Can this issue be reproduced? If so, please provide the hit rate

      reproduction rate will be updated, currently only one time occurrence

       

      Can this issue be reproduced from the UI?

      most likely there is no relation between UI and CLI deployment processes with this issue

      If this is a regression, please provide more details to justify this:

      4.16 ODF Tech Preview was tested without having this issue 

      Steps to Reproduce:

      1. Install ROSA HCP cluster with OCP version 4.17.4 

      2. Install ODF 4.16.3-2 

      3. Create Storage System 

      The exact date and time when the issue was observed, including timezone details:

      Israeli time 2024-11-27 12:41:59, when rook-ceph-mon-c-canary-658887b5-zlshb started to restart

      Actual results:

      Deployment failed

       

      Expected results:

      Deployment ready

       

      Logs collected and log location:

      Ceph cluster: 

       oc describe cephcluster ocs-storagecluster-cephcluster 
      Name:         ocs-storagecluster-cephcluster
      Namespace:    odf-storage
      Labels:       app=ocs-storagecluster
      Annotations:  <none>
      API Version:  ceph.rook.io/v1
      Kind:         CephCluster
      Metadata:
        Creation Timestamp:  2024-11-27T10:36:54Z
        Finalizers:
          cephcluster.ceph.rook.io
        Generation:  1
        Owner References:
          API Version:           ocs.openshift.io/v1
          Block Owner Deletion:  true
          Controller:            true
          Kind:                  StorageCluster
          Name:                  ocs-storagecluster
          UID:                   bbd30593-f606-475c-a506-666bf28c1959
        Resource Version:        150448
        UID:                     fc00a627-5ba8-4013-a1e1-a8b35730e613
      Spec:
        Ceph Version:
          Image:  registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:75bd8969ab3f86f2203a1ceb187876f44e54c9ee3b917518c4d696cf6cd88ce3
        Cleanup Policy:
          Sanitize Disks:
        Continue Upgrade After Checks Even If Not Healthy:  true
        Crash Collector:
        Csi:
          Cephfs:
            Kernel Mount Options:  ms_mode=prefer-crc
          Read Affinity:
            Enabled:  true
        Dashboard:
        Data Dir Host Path:  /var/lib/rook
        Disruption Management:
          Machine Disruption Budget Namespace:  openshift-machine-api
          Manage Pod Budgets:                   true
        External:
        Health Check:
          Daemon Health:
            Mon:
            Osd:
            Status:
        Labels:
          Exporter:
            rook.io/managedBy:  ocs-storagecluster
          Mgr:
            Odf - Resource - Profile:  
          Mon:
            Odf - Resource - Profile:  
          Monitoring:
            rook.io/managedBy:  ocs-storagecluster
          Osd:
            Odf - Resource - Profile:  
        Log Collector:
          Enabled:       true
          Max Log Size:  500Mi
          Periodicity:   daily
        Mgr:
          Count:  2
          Modules:
            Enabled:  true
            Name:     pg_autoscaler
            Enabled:  true
            Name:     balancer
        Mon:
          Count:  3
          Volume Claim Template:
            Metadata:
            Spec:
              Resources:
                Requests:
                  Storage:         50Gi
              Storage Class Name:  gp3-csi
        Monitoring:
          Enabled:   true
          Interval:  30s
        Network:
          Connections:
            requireMsgr2:  true
          Multi Cluster Service:
        Placement:
          All:
            Node Affinity:
              Required During Scheduling Ignored During Execution:
                Node Selector Terms:
                  Match Expressions:
                    Key:       cluster.ocs.openshift.io/openshift-storage
                    Operator:  Exists
            Tolerations:
              Effect:    NoSchedule
              Key:       node.ocs.openshift.io/storage
              Operator:  Equal
              Value:     true
          Arbiter:
            Tolerations:
              Effect:    NoSchedule
              Key:       node-role.kubernetes.io/master
              Operator:  Exists
          Mon:
            Node Affinity:
              Required During Scheduling Ignored During Execution:
                Node Selector Terms:
                  Match Expressions:
                    Key:       cluster.ocs.openshift.io/openshift-storage
                    Operator:  Exists
            Pod Anti Affinity:
              Required During Scheduling Ignored During Execution:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-mon
                Topology Key:  topology.rook.io/rack
        Priority Class Names:
          Mgr:  system-node-critical
          Mon:  system-node-critical
          Osd:  system-node-critical
        Resources:
          Mgr:
            Limits:
              Cpu:     2
              Memory:  3Gi
            Requests:
              Cpu:     1
              Memory:  1536Mi
          Mon:
            Limits:
              Cpu:     1
              Memory:  2Gi
            Requests:
              Cpu:     1
              Memory:  2Gi
        Security:
          Key Rotation:
            Enabled:   false
            Schedule:  @weekly
          Kms:
        Storage:
          Flapping Restart Interval Hours:  24
          Storage Class Device Sets:
            Count:  1
            Name:   ocs-deviceset-0
            Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Portable:                true
            Prepare Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Resources:
              Limits:
                Cpu:     2
                Memory:  5Gi
              Requests:
                Cpu:                 2
                Memory:              5Gi
            Tune Fast Device Class:  true
            Volume Claim Templates:
              Metadata:
                Annotations:
                  Crush Device Class:  ssd
              Spec:
                Access Modes:
                  ReadWriteOnce
                Resources:
                  Requests:
                    Storage:         100Gi
                Storage Class Name:  gp3-csi
                Volume Mode:         Block
            Count:                   1
            Name:                    ocs-deviceset-1
            Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Portable:                true
            Prepare Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Resources:
              Limits:
                Cpu:     2
                Memory:  5Gi
              Requests:
                Cpu:                 2
                Memory:              5Gi
            Tune Fast Device Class:  true
            Volume Claim Templates:
              Metadata:
                Annotations:
                  Crush Device Class:  ssd
              Spec:
                Access Modes:
                  ReadWriteOnce
                Resources:
                  Requests:
                    Storage:         100Gi
                Storage Class Name:  gp3-csi
                Volume Mode:         Block
            Count:                   1
            Name:                    ocs-deviceset-2
            Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Portable:                true
            Prepare Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Resources:
              Limits:
                Cpu:     2
                Memory:  5Gi
              Requests:
                Cpu:                 2
                Memory:              5Gi
            Tune Fast Device Class:  true
            Volume Claim Templates:
              Metadata:
                Annotations:
                  Crush Device Class:  ssd
              Spec:
                Access Modes:
                  ReadWriteOnce
                Resources:
                  Requests:
                    Storage:         100Gi
                Storage Class Name:  gp3-csi
                Volume Mode:         Block
          Store:
      Status:
        Conditions:
          Last Heartbeat Time:   2024-11-27T11:46:45Z
          Last Transition Time:  2024-11-27T11:46:45Z
          Message:               failed to create cluster: failed to start ceph monitors: failed to assign pods to mons: failed to schedule mons
          Reason:                ClusterProgressing
          Status:                False
          Type:                  Progressing
        Message:                 failed to create cluster: failed to start ceph monitors: failed to assign pods to mons: failed to schedule mons
        Phase:                   Progressing
        State:                   Error
        Version:
          Image:    registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:75bd8969ab3f86f2203a1ceb187876f44e54c9ee3b917518c4d696cf6cd88ce3
          Version:  18.2.1-229
      Events:
        Type     Reason           Age                   From                          Message
        ----     ------           ----                  ----                          -------
        Warning  ReconcileFailed  9m10s (x19 over 76m)  rook-ceph-cluster-controller  failed to reconcile CephCluster "odf-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph monitors: failed to assign pods to mons: failed to schedule mons

      oc get pods
      NAME                                               READY   STATUS    RESTARTS      AGE
      alertmanager-odf-alertmanager-0                    2/2     Running   0             94m
      csi-addons-controller-manager-59fd7c8678-wbgqs     2/2     Running   2 (13m ago)   93m
      csi-cephfsplugin-5kj5w                             2/2     Running   0             89m
      csi-cephfsplugin-drw5f                             2/2     Running   0             89m
      csi-cephfsplugin-fmtg4                             2/2     Running   0             89m
      csi-cephfsplugin-provisioner-58ffddb9bd-dqb7b      6/6     Running   0             89m
      csi-cephfsplugin-provisioner-58ffddb9bd-pggtk      6/6     Running   0             89m
      csi-rbdplugin-2rgsj                                3/3     Running   0             89m
      csi-rbdplugin-4vjjk                                3/3     Running   0             89m
      csi-rbdplugin-fmszd                                3/3     Running   0             89m
      csi-rbdplugin-provisioner-5b584957b9-s4lcs         6/6     Running   0             89m
      csi-rbdplugin-provisioner-5b584957b9-sm7pv         6/6     Running   0             89m
      noobaa-operator-7f995955c6-bgvzb                   1/1     Running   0             68m
      ocs-operator-78895c9b5-w5dfk                       1/1     Running   0             94m
      odf-console-5fd47794f5-j99lg                       1/1     Running   0             94m
      odf-operator-controller-manager-654d64c545-clf98   2/2     Running   1 (35m ago)   94m
      prometheus-odf-prometheus-0                        3/3     Running   0             94m
      prometheus-operator-76b8494cdb-slm2b               1/1     Running   0             94m
      rook-ceph-mon-a-canary-5ccdb66769-zr67n            0/2     Pending   0             2m25s
      rook-ceph-mon-b-canary-67bf49d7bc-427v9            0/2     Pending   0             2m25s
      rook-ceph-mon-c-canary-658887b5-l6c2p              0/2     Pending   0             2m24s
      rook-ceph-operator-857b7b766b-dcs8l                1/1     Running   0             94m
      ux-backend-server-7c75f44f9-74rww                  2/2     Running   0             94m

      oc describe storagecluster ocs-storagecluster
      Name:         ocs-storagecluster
      Namespace:    odf-storage
      Labels:       <none>
      Annotations:  uninstall.ocs.openshift.io/cleanup-policy: delete
                    uninstall.ocs.openshift.io/mode: graceful
      API Version:  ocs.openshift.io/v1
      Kind:         StorageCluster
      Metadata:
        Creation Timestamp:  2024-11-27T10:36:54Z
        Finalizers:
          storagecluster.ocs.openshift.io
        Generation:  2
        Owner References:
          API Version:     odf.openshift.io/v1alpha1
          Kind:            StorageSystem
          Name:            ocs-storagecluster-storagesystem
          UID:             1ff8d698-220b-4528-bf52-4fa7c87faa1f
        Resource Version:  150457
        UID:               bbd30593-f606-475c-a506-666bf28c1959
      Spec:
        Arbiter:
        Encryption:
          Key Rotation:
            Schedule:  @weekly
          Kms:
        External Storage:
        Managed Resources:
          Ceph Block Pools:
          Ceph Cluster:
          Ceph Config:
          Ceph Dashboard:
          Ceph Filesystems:
            Data Pool Spec:
              Application:  
              Erasure Coded:
                Coding Chunks:  0
                Data Chunks:    0
              Mirroring:
              Quotas:
              Replicated:
                Size:  0
              Status Check:
                Mirror:
          Ceph Non Resilient Pools:
            Count:  1
            Resources:
            Volume Claim Template:
              Metadata:
              Spec:
                Resources:
              Status:
          Ceph Object Store Users:
          Ceph Object Stores:
          Ceph RBD Mirror:
            Daemon Count:  1
          Ceph Toolbox:
        Mirroring:
        Storage Device Sets:
          Config:
          Count:  1
          Data PVC Template:
            Metadata:
            Spec:
              Access Modes:
                ReadWriteOnce
              Resources:
                Requests:
                  Storage:         100Gi
              Storage Class Name:  gp3-csi
              Volume Mode:         Block
            Status:
          Name:  ocs-deviceset
          Placement:
          Portable:  true
          Prepare Placement:
          Replica:  3
          Resources:
      Status:
        Conditions:
          Last Heartbeat Time:   2024-11-27T10:36:54Z
          Last Transition Time:  2024-11-27T10:36:54Z
          Message:               Version check successful
          Reason:                VersionMatched
          Status:                False
          Type:                  VersionMismatch
          Last Heartbeat Time:   2024-11-27T11:46:45Z
          Last Transition Time:  2024-11-27T10:36:54Z
          Message:               Error while reconciling: Operation cannot be fulfilled on cephclusters.ceph.rook.io "ocs-storagecluster-cephcluster": the object has been modified; please apply your changes to the latest version and try again
          Reason:                ReconcileFailed
          Status:                False
          Type:                  ReconcileComplete
          Last Heartbeat Time:   2024-11-27T10:36:54Z
          Last Transition Time:  2024-11-27T10:36:54Z
          Message:               Initializing StorageCluster
          Reason:                Init
          Status:                False
          Type:                  Available
          Last Heartbeat Time:   2024-11-27T10:36:54Z
          Last Transition Time:  2024-11-27T10:36:54Z
          Message:               Initializing StorageCluster
          Reason:                Init
          Status:                True
          Type:                  Progressing
          Last Heartbeat Time:   2024-11-27T10:36:54Z
          Last Transition Time:  2024-11-27T10:36:54Z
          Message:               Initializing StorageCluster
          Reason:                Init
          Status:                False
          Type:                  Degraded
          Last Heartbeat Time:   2024-11-27T10:36:54Z
          Last Transition Time:  2024-11-27T10:36:54Z
          Message:               Initializing StorageCluster
          Reason:                Init
          Status:                Unknown
          Type:                  Upgradeable
        Current Mon Count:       3
        Failure Domain:          rack
        Failure Domain Key:      topology.rook.io/rack
        Failure Domain Values:
          rack0
          rack1
          rack2
        Images:
          Ceph:
            Actual Image:   registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:75bd8969ab3f86f2203a1ceb187876f44e54c9ee3b917518c4d696cf6cd88ce3
            Desired Image:  registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:75bd8969ab3f86f2203a1ceb187876f44e54c9ee3b917518c4d696cf6cd88ce3
          Noobaa Core:
            Desired Image:  registry.redhat.io/odf4/mcg-core-rhel9@sha256:ec325c8001b636ec024302708ba3020c1aa2324f48336e186f628341f019d1ee
          Noobaa DB:
            Desired Image:  registry.redhat.io/rhel9/postgresql-15@sha256:10475924583aba63c50b00337b4097cdad48e7a8567fd77f400b278ad1782bfc
        Kms Server Connection:
        Node Topologies:
          Labels:
            failure-domain.beta.kubernetes.io/region:
              us-west-2
            failure-domain.beta.kubernetes.io/zone:
              us-west-2a
            kubernetes.io/hostname:
              ip-10-0-0-142.us-west-2.compute.internal
              ip-10-0-0-200.us-west-2.compute.internal
              ip-10-0-0-245.us-west-2.compute.internal
            topology.rook.io/rack:
              rack0
              rack1
              rack2
        Phase:  Error
        Related Objects:
          API Version:       ceph.rook.io/v1
          Kind:              CephCluster
          Name:              ocs-storagecluster-cephcluster
          Namespace:         odf-storage
          Resource Version:  146177
          UID:               fc00a627-5ba8-4013-a1e1-a8b35730e613
        Version:             4.16.3
      Events:                <none>
      (venv3.9) ➜  /tmp oc get cephcluster
      NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE         MESSAGE                                                                                                           HEALTH   EXTERNAL   FSID
      ocs-storagecluster-cephcluster   /var/lib/rook     3          78m   Progressing   failed to create cluster: failed to start ceph monitors: failed to assign pods to mons: failed to schedule mons                       
      (venv3.9) ➜  /tmp oc describe cephcluster ocs-storagecluster-cephcluster 
      Name:         ocs-storagecluster-cephcluster
      Namespace:    odf-storage
      Labels:       app=ocs-storagecluster
      Annotations:  <none>
      API Version:  ceph.rook.io/v1
      Kind:         CephCluster
      Metadata:
        Creation Timestamp:  2024-11-27T10:36:54Z
        Finalizers:
          cephcluster.ceph.rook.io
        Generation:  1
        Owner References:
          API Version:           ocs.openshift.io/v1
          Block Owner Deletion:  true
          Controller:            true
          Kind:                  StorageCluster
          Name:                  ocs-storagecluster
          UID:                   bbd30593-f606-475c-a506-666bf28c1959
        Resource Version:        150448
        UID:                     fc00a627-5ba8-4013-a1e1-a8b35730e613
      Spec:
        Ceph Version:
          Image:  registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:75bd8969ab3f86f2203a1ceb187876f44e54c9ee3b917518c4d696cf6cd88ce3
        Cleanup Policy:
          Sanitize Disks:
        Continue Upgrade After Checks Even If Not Healthy:  true
        Crash Collector:
        Csi:
          Cephfs:
            Kernel Mount Options:  ms_mode=prefer-crc
          Read Affinity:
            Enabled:  true
        Dashboard:
        Data Dir Host Path:  /var/lib/rook
        Disruption Management:
          Machine Disruption Budget Namespace:  openshift-machine-api
          Manage Pod Budgets:                   true
        External:
        Health Check:
          Daemon Health:
            Mon:
            Osd:
            Status:
        Labels:
          Exporter:
            rook.io/managedBy:  ocs-storagecluster
          Mgr:
            Odf - Resource - Profile:  
          Mon:
            Odf - Resource - Profile:  
          Monitoring:
            rook.io/managedBy:  ocs-storagecluster
          Osd:
            Odf - Resource - Profile:  
        Log Collector:
          Enabled:       true
          Max Log Size:  500Mi
          Periodicity:   daily
        Mgr:
          Count:  2
          Modules:
            Enabled:  true
            Name:     pg_autoscaler
            Enabled:  true
            Name:     balancer
        Mon:
          Count:  3
          Volume Claim Template:
            Metadata:
            Spec:
              Resources:
                Requests:
                  Storage:         50Gi
              Storage Class Name:  gp3-csi
        Monitoring:
          Enabled:   true
          Interval:  30s
        Network:
          Connections:
            requireMsgr2:  true
          Multi Cluster Service:
        Placement:
          All:
            Node Affinity:
              Required During Scheduling Ignored During Execution:
                Node Selector Terms:
                  Match Expressions:
                    Key:       cluster.ocs.openshift.io/openshift-storage
                    Operator:  Exists
            Tolerations:
              Effect:    NoSchedule
              Key:       node.ocs.openshift.io/storage
              Operator:  Equal
              Value:     true
          Arbiter:
            Tolerations:
              Effect:    NoSchedule
              Key:       node-role.kubernetes.io/master
              Operator:  Exists
          Mon:
            Node Affinity:
              Required During Scheduling Ignored During Execution:
                Node Selector Terms:
                  Match Expressions:
                    Key:       cluster.ocs.openshift.io/openshift-storage
                    Operator:  Exists
            Pod Anti Affinity:
              Required During Scheduling Ignored During Execution:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-mon
                Topology Key:  topology.rook.io/rack
        Priority Class Names:
          Mgr:  system-node-critical
          Mon:  system-node-critical
          Osd:  system-node-critical
        Resources:
          Mgr:
            Limits:
              Cpu:     2
              Memory:  3Gi
            Requests:
              Cpu:     1
              Memory:  1536Mi
          Mon:
            Limits:
              Cpu:     1
              Memory:  2Gi
            Requests:
              Cpu:     1
              Memory:  2Gi
        Security:
          Key Rotation:
            Enabled:   false
            Schedule:  @weekly
          Kms:
        Storage:
          Flapping Restart Interval Hours:  24
          Storage Class Device Sets:
            Count:  1
            Name:   ocs-deviceset-0
            Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Portable:                true
            Prepare Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Resources:
              Limits:
                Cpu:     2
                Memory:  5Gi
              Requests:
                Cpu:                 2
                Memory:              5Gi
            Tune Fast Device Class:  true
            Volume Claim Templates:
              Metadata:
                Annotations:
                  Crush Device Class:  ssd
              Spec:
                Access Modes:
                  ReadWriteOnce
                Resources:
                  Requests:
                    Storage:         100Gi
                Storage Class Name:  gp3-csi
                Volume Mode:         Block
            Count:                   1
            Name:                    ocs-deviceset-1
            Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Portable:                true
            Prepare Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Resources:
              Limits:
                Cpu:     2
                Memory:  5Gi
              Requests:
                Cpu:                 2
                Memory:              5Gi
            Tune Fast Device Class:  true
            Volume Claim Templates:
              Metadata:
                Annotations:
                  Crush Device Class:  ssd
              Spec:
                Access Modes:
                  ReadWriteOnce
                Resources:
                  Requests:
                    Storage:         100Gi
                Storage Class Name:  gp3-csi
                Volume Mode:         Block
            Count:                   1
            Name:                    ocs-deviceset-2
            Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Portable:                true
            Prepare Placement:
              Node Affinity:
                Required During Scheduling Ignored During Execution:
                  Node Selector Terms:
                    Match Expressions:
                      Key:       cluster.ocs.openshift.io/openshift-storage
                      Operator:  Exists
              Tolerations:
                Effect:    NoSchedule
                Key:       node.ocs.openshift.io/storage
                Operator:  Equal
                Value:     true
              Topology Spread Constraints:
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        topology.rook.io/rack
                When Unsatisfiable:  DoNotSchedule
                Label Selector:
                  Match Expressions:
                    Key:       app
                    Operator:  In
                    Values:
                      rook-ceph-osd
                      rook-ceph-osd-prepare
                Max Skew:            1
                Topology Key:        kubernetes.io/hostname
                When Unsatisfiable:  ScheduleAnyway
            Resources:
              Limits:
                Cpu:     2
                Memory:  5Gi
              Requests:
                Cpu:                 2
                Memory:              5Gi
            Tune Fast Device Class:  true
            Volume Claim Templates:
              Metadata:
                Annotations:
                  Crush Device Class:  ssd
              Spec:
                Access Modes:
                  ReadWriteOnce
                Resources:
                  Requests:
                    Storage:         100Gi
                Storage Class Name:  gp3-csi
                Volume Mode:         Block
          Store:
      Status:
        Conditions:
          Last Heartbeat Time:   2024-11-27T11:46:45Z
          Last Transition Time:  2024-11-27T11:46:45Z
          Message:               failed to create cluster: failed to start ceph monitors: failed to assign pods to mons: failed to schedule mons
          Reason:                ClusterProgressing
          Status:                False
          Type:                  Progressing
        Message:                 failed to create cluster: failed to start ceph monitors: failed to assign pods to mons: failed to schedule mons
        Phase:                   Progressing
        State:                   Error
        Version:
          Image:    registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:75bd8969ab3f86f2203a1ceb187876f44e54c9ee3b917518c4d696cf6cd88ce3
          Version:  18.2.1-229
      Events:
        Type     Reason           Age                   From                          Message
        ----     ------           ----                  ----                          -------
        Warning  ReconcileFailed  9m10s (x19 over 76m)  rook-ceph-cluster-controller  failed to reconcile CephCluster "odf-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph monitors: failed to assign pods to mons: failed to schedule mons

      Additional info:

      ocp-mg: https://url.corp.redhat.com/f1393d1
      ocs-mg: https://url.corp.redhat.com/dc2190f

              sapillai Santosh Pillai
              rh-ee-dosypenk Daniel Osypenko
              Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

                Created:
                Updated:
                Resolved: