-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.14.z, 4.15.z, 4.16.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
None
-
Hypershift Sprint 250
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Several platform components (e.g., console) and user workloads running on a HostedCluster became unstable during their HCP's upgrade to 4.14.8. Despite being reported as Ready with 0 restarts, the kube-apiserver pods seem to become unreachable at some points, e.g.,
$ oc describe pod console-68d48f57d7-kxjxw -n openshift-console
Warning Unhealthy 61m (x10 over 62m) kubelet Readiness probe failed: Get "https://10.131.0.17:8443/health": dial tcp 10.131.0.17:8443: connect: connection refused
Warning BackOff 7m52s (x75 over 36m) kubelet Back-off restarting failed container console in pod console-68d48f57d7-kxjxw_openshift-console(e8cf0eeb-dddc-4e9f-a107-3abe7b83647c)
Warning ProbeError 2m53s (x336 over 62m) kubelet Readiness probe error: Get "https://10.131.0.17:8443/health": dial tcp 10.131.0.17:8443: connect: connection refused
$ oc get po -l app=kube-apiserver -n ocm-production-$CLUSTER_ID
NAME READY STATUS RESTARTS AGE
kube-apiserver-68676579bb-5fzvr 5/5 Running 0 98m
kube-apiserver-68676579bb-zjk5d 5/5 Running 0 100m
$ oc get svc kube-apiserver -n ocm-production-$CLUSTER_ID
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-apiserver ClusterIP 172.30.235.19 <none> 6443/TCP 28h
Version-Release number of selected component (if applicable):
HCP version: 4.14.8 Hypershift operator on HCP's management cluster: quay.io/acm-d/rhtap-hypershift-operator:2f7c947a8490bd3627602c9b50bc62e84ced31c3
How reproducible:
Unclear
Steps to Reproduce:
Initiate upgrade of HCP
Actual results:
Many kube-apiserver requests failing
Expected results:
kube-apiserver remains mostly stable during upgrades
Additional info:
This relates to a hard ticket to which an SRE will be assigned shortly. See comments for details.