-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
Upstream
-
False
-
False
-
OCPSTRAT-46 - Strategic Upstream Work - OCP Control Plane and Node Lifecycle Group
-
Workloads Sprint 208, Workloads Sprint 210, Workloads Sprint 211, Workloads Sprint 212, Workloads Sprint 214, Workloads Sprint 215, Workloads Sprint 216, Workloads Sprint 217, Workloads - 4.12, Workloads Sprint 225, Workloads Sprint 226, Workloads Sprint 227, Workloads Sprint 228, Workloads Sprint 229, Workloads Sprint 230, Workloads Sprint 231, Workloads Sprint 232, Workloads Sprint 233, Workloads Sprint 234, Workloads Sprint 235, Workloads Sprint 236, Workloads Sprint 237, Workloads Sprint 238, Workloads Sprint 239, Workloads Sprint 240, Workloads Sprint 241
Currently if GC, Quota, NamespaceController, ... experience problems when reading discovery we don't expose that information anywhere else than just in kube-controller-manager logs. We would like to go degraded with kcm-o when we know that either GC or quota is having issues. This should help identifying issues like "my namespace is stuck in terminating".
For this reason we want to expose the information about discover problems under /debug endpoint of kcm.
Resources:
Exposing GC graph info under /debug: https://github.com/kubernetes/kubernetes/pull/66623
Arch call notes: https://docs.google.com/document/d/1x50GTpboRSGOIKHjyWyaN8Q4p4jgTP2Bndw8kdqyIfU/edit#heading=h.2lffczpupyq6
NamespaceController issue: https://issues.redhat.com/browse/WRKLDS-724
- is related to
-
WRKLDS-637 ClusterResourceQuota controller: Expose metrics about resource discovery
- To Do
- relates to
-
OCPBUGS-5806 kube-controller-manager crashes when kubernetes API is down
- Closed
1.
|
kube-controller-manager should be more tolerant of API downtime and not crash | Closed | Unassigned |