Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4744

KubeletDown firing but everything seems OK

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • 4.13
    • Monitoring
    • None
    • -
    • Moderate
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      We are upgrading the build02 build cluster to 4.13.0-ec.0, and KubletDown has been firing for some time. The runbook for the alerts says:
      
      > This alert is triggered when the monitoring system has not been able to reach any of the cluster's Kubelets for more than 15 minutes.
      
      However, the cluster is otherwise behaving normally.  It's not clear from the runbook what our next step should be, how do we find out what kubelet is not responding? The diagnosis steps in https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubeletDown.md don't really explain what to do with the output of the given `oc` commands.

       

            spasquie@redhat.com Simon Pasquier
            stbenjam Stephen Benjamin
            Junqi Zhao Junqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: