-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.13
-
None
-
-
-
Moderate
-
None
-
Proposed
-
False
-
Description of problem:
We are upgrading the build02 build cluster to 4.13.0-ec.0, and KubletDown has been firing for some time. The runbook for the alerts says: > This alert is triggered when the monitoring system has not been able to reach any of the cluster's Kubelets for more than 15 minutes. However, the cluster is otherwise behaving normally. It's not clear from the runbook what our next step should be, how do we find out what kubelet is not responding? The diagnosis steps in https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/KubeletDown.md don't really explain what to do with the output of the given `oc` commands.
- duplicates
-
OCPBUGS-4521 all kubelet targets are down after a few hours
- Closed