Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-1285

PrometheusDuplicateTimestamps alerts generated by rook-ceph-osd-key-rotation-X pods

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • odf-4.18
    • odf-4.16
    • ocs-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Committed
    • ?
    • ?
    • 4.18.0-114
    • Committed
    • Release Note Not Required
    • Moderate
    • Proposed
    • ?
    • None

      Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

       PrometheusDuplicateTimestamps alerts generated by rook-ceph-osd-key-rotation-X pods. The pods rook-ceph-osd-key-rotation-X have tolerations defined 2 times because of which the metrics are getting duplicated.

      The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

      All

       

      The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

      4.16.4

       

      The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

      4.16.z

       

      Does this issue impact your ability to continue to work with the product?

      There is no direct impact but the alert PrometheusDuplicateTimestamps keeps on popping up when rook-ceph-osd-key-rotation-X pods are completed.

       

      Is there any workaround available to the best of your knowledge?

      Deleting the completed jobs associated with rook-ceph-osd-key-rotation-X works in fixing the problem.

       

      Can this issue be reproduced? If so, please provide the hit rate

      100%

       

      Can this issue be reproduced from the UI?

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:

      1. Install and setup RHODF

      2. Create StorageSystem, Enable Encryption using vault and enable taints on storage nodes as well.

      3. Wait for the cronjobs rook-ceph-osd-key-rotation-X to be created.

      4. Trigger jobs from these cronjobs so that new new pods are spin up.

      5. After few minutes PrometheusDuplicateTimestamps

       

      The exact date and time when the issue was observed, including timezone details:

       

      Actual results:

      PrometheusDuplicateTimestamps is streamed when pods are created by cronjobs rook-ceph-osd-key-rotation-X. This happens because of presence of duplicate tolerations for storage node in the associated cronjobs:

       

                tolerations:
                - effect: NoSchedule
                  key: node.ocs.openshift.io/storage
                  operator: Equal
                  value: "true"
                - effect: NoSchedule
                  key: node.ocs.openshift.io/storage
                  operator: Equal
                  value: "true" 

       

       

      Expected results:

      When the cronjobs rook-ceph-osd-key-rotation-X are completed, then it should not trigger above said alert, and the associated pods should not have only duplicate toleration:

                tolerations:
                - effect: NoSchedule
                  key: node.ocs.openshift.io/storage
                  operator: Equal
                  value: "true"
                - effect: NoSchedule 

      Logs collected and log location:

      Below are the logs seen in prometheus-k8s pods:

      2024-12-27T18:39:01.605719007Z ts=2024-12-27T18:39:01.605Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kube-state-metrics/0 target=https://<pod-ip>:8443/metrics msg="Duplicate sample for timestamp" series="kube_pod_tolerations{namespace=\"openshift-storage\",pod=\"rook-ceph-osd-key-rotation-21-28903680-49xxl\",uid=\"3a3c000a-1c8d-432c-a126-4f9291190902\",key=\"node.ocs.openshift.io/storage\",operator=\"Equal\",value=\"true\",effect=\"NoSchedule\"}" 

       

      Additional info:

       

              mparida@redhat.com Malay Kumar Parida
              rhn-support-dgautam Dhruv Gautam
              Vishakha Kathole Vishakha Kathole
              Votes:
              0 Vote for this issue
              Watchers:
              31 Start watching this issue

                Created:
                Updated:
                Resolved: