Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1426

PrometheusNotConnectedToAlertmanagers due to prometheus pod down

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Normal Normal
    • None
    • 4.11.z
    • Monitoring
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      We consistently see pod `prometheus-k8s-1` is being killed and restarted.
      1. Restarting is stuck due to WAL reply
      2. Pod consumes >17G memory before being killed, which is well beyond the request memory setting (https://github.com/openshift/release/blob/4ad2f102f3b6ff11b1a77331b9f788558c56b548/clusters/build-clusters/01_cluster/openshift-monitoring/cluster-monitoring-config_configmap.yaml#L26)
      

      Version-Release number of selected component (if applicable):

      4.11.4
      

      How reproducible:

      It just happens to build01, no  specific step to trigger the issue.
      

      Steps to Reproduce:

      N/A
      

      Actual results:

      Name Status Ready Restarts Owner Memory CPU Created
      prometheus-k8s-1 Running	5/6	41 prometheus-k8s	1,521.2 MiB	0.400 cores	Sep 16, 2022, 5:24 AM
      

      Expected results:

      1. prometheus-k8s works fine
      2. No alarms PrometheusNotConnectedToAlertmanagers is fired
      

      Additional info:

      N/A
      

        1. screenshot-1.png
          54 kB
          Simon Pasquier
        2. screenshot-2.png
          34 kB
          Simon Pasquier
        3. inspect.local.8952935911337421758.tar.gz
          2.65 MB
          Bear Chen
        4. image-2022-09-19-10-21-30-195.png
          103 kB
          Simon Pasquier
        5. promethus-k8s-1.log
          39 kB
          Bear Chen
        6. inspect.local.4538195566830101267.tar.gz
          2.75 MB
          Bear Chen

              hasun@redhat.com Haoyu Sun
              bechen@redhat.com Bear Chen
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: