-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
4.10.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
Rejected
-
None
-
Customer Escalated
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Readiness Probes for Prometheus and Prometheus Adapter got unhealth after upgrading form OCP 4.10.17 to 4.10.25, future upgrade to 4.10.36 didn't changed the behaviour.
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.10.36
How reproducible:
Not reproducible on other cluster
Steps to Reproduce:
1. 2. 3.
Actual results:
2h59m Warning ProbeError pod/prometheus-adapter-6dbcc5fdc5-mpb6h Liveness probe error: Get "https://10.253.16.45:6443/livez": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
body:
2h59m Warning Unhealthy pod/prometheus-adapter-6dbcc5fdc5-mpb6h Liveness probe failed: Get "https://10.253.16.45:6443/livez": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2h40m Warning ProbeError pod/prometheus-adapter-6dbcc5fdc5-mpb6h Readiness probe error: Get "https://10.253.16.45:6443/readyz": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
body:
2h40m Warning Unhealthy pod/prometheus-adapter-6dbcc5fdc5-mpb6h Readiness probe failed: Get "https://10.253.16.45:6443/readyz": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
4m20s Warning Unhealthy pod/prometheus-k8s-0 Readiness probe failed: command timed out
4m21s Warning Unhealthy pod/prometheus-k8s-1 Readiness probe failed: command timed out
Expected results:
no issues running readiness and liveness probes
Additional info:
must-gather and sosreports attached to support case
- is related to
-
OCPBUGS-2499 ReadinessProbes failing after upgrade to OpenShift 4.10.25
-
- Closed
-
- is triggering
-
MON-3390 Write post-mortem document on liveness probes being unresponsive due to VPA
-
- Closed
-
- relates to
-
MON-3292 Make Prometheus flooding/DoS problems easier to detect
-
- To Do
-