Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-631

[2308347] CephMgrIsAbsent is not firing when scaling down one mgr deployment

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • odf-4.18
    • odf-4.16
    • ceph-monitoring
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):

      This bug is created as a consequence of the change covered by epic https://issues.redhat.com/browse/RHSTOR-4139 (ODF 4.15)

      After scaling down one of the two MGR pods CephMgrIsAbsent Alert does not fire.

      Alert appears only after scaling down all mgr deployments.
      We may want to notify user with an alert of Warning lvl of CephMgrIsAbsent

      scale down mgr pod and check out CephMgrIsAbsent
      oc get deployment | grep mgr
      rook-ceph-mgr-a 0/0 0 0 7h27m
      rook-ceph-mgr-b 1/1 1 1 7h27m

      Expression:
      label_replace((up

      {job="rook-ceph-mgr"} == 0 or absent(up{job="rook-ceph-mgr"}

      )), "namespace", "odf-storage", "", "")

      • namespace is dynamic, and both with original deployment on openshift-storage and custom odf-storage alert is not firing.

      Version of all relevant components (if applicable):

      Issue confirmed on ODF 4.15 and on ODF 4.16 (vSphere and ROSA HCP deployments)

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?
      no

      Is there any workaround available to the best of your knowledge?
      no

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?
      2

      Can this issue reproducible?
      yes

      Can this issue reproduce from the UI?
      yes

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1. oc -n odf-storage scale --replicas=0 deployment/rook-ceph-mgr-a
      2. wait 5 min
      3. check alert via management-console / Observe / Alerts or using
      curl -k -X GET "<route>/api/v1/alerts?silenced=False&inhibited=False" -H "Authorization: Bearer <token>" | jq '.data.alerts[] | select(.labels.alertname == "CephMgrIsAbsent")'

      Actual results:
      CephMgrIsAbsent does not fire

      Expected results:
      CephMgrIsAbsent fires to notify user risk of loosing all mgr pods

      Additional info:
      Latest changes in regards to CephMgrIsAbsent that I found -https://github.com/rook/rook/issues/12249

              dkamboj@redhat.com Divyansh Kamboj
              rh-ee-dosypenk Daniel Osypenko
              Harish Nallur Vittal Rao Harish Nallur Vittal Rao
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: