Uploaded image for project: 'Hybrid Cloud Console'
  1. Hybrid Cloud Console
  2. RHCLOUD-33105

Address Lag Reporting Issue to Prometheus

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Unset
    • No

      Description of Problem

      In the past 12 hours, there has been a significant increase in lag, accompanied by a failure to report this lag to Prometheus. This unreported lag is exacerbating the overall lag issue. It is crucial to resolve this problem before the General Availability (GA) release to ensure optimal performance and accurate monitoring. Please investigate and address this reporting failure promptly.

      Ref Link :
                  https://redhat-internal.slack.com/archives/C04B6CJ3A81/p1707908870245839

      How reproducible

      https://prometheus.crcp01ue1.devshift.net/graph?g0.expr=sum%20(kafka_consumergroup_group_lag%7Bgroup%3D%22ros-ocp%22%2C%20topic%3D%22hccm.ros.events%22%7D%20)&g0.tab=0&g0.display_mode=stacked&g0.show_exemplars=0&g0.range_input=12h

      Steps to Reproduce

      NA

      Actual Behavior

      In the past 12 hours, there has been a significant increase in system lag, and failures to report this lag to Prometheus have been observed. This failure to report is contributing to the overall lag issue, impacting system performance and monitoring accuracy.

      Expected Behavior

      System performance should remain stable without significant increases in lag. Any instances of lag should be accurately reported to Prometheus to ensure proper monitoring and timely resolution of issues. There should be no failures in reporting lag to Prometheus, ensuring all performance data is captured correctly.

       

      Business Impact / Additional info

              Unassigned Unassigned
              vinakuma@redhat.com Vinay Kumar Mysore Sathya Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: