Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14.z, 4.15.z, 4.16.0
Component/s: HyperShift
Labels:
- ServiceDeliveryImpact
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
Hypershift Sprint 250
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Several platform components (e.g., console) and user workloads running on a HostedCluster became unstable during their HCP's upgrade to 4.14.8. Despite being reported as Ready with 0 restarts, the kube-apiserver pods seem to become unreachable at some points, e.g.,

$ oc describe pod console-68d48f57d7-kxjxw -n openshift-console
  Warning  Unhealthy       61m (x10 over 62m)     kubelet            Readiness probe failed: Get "https://10.131.0.17:8443/health": dial tcp 10.131.0.17:8443: connect: connection refused
  Warning  BackOff         7m52s (x75 over 36m)   kubelet            Back-off restarting failed container console in pod console-68d48f57d7-kxjxw_openshift-console(e8cf0eeb-dddc-4e9f-a107-3abe7b83647c)
  Warning  ProbeError      2m53s (x336 over 62m)  kubelet            Readiness probe error: Get "https://10.131.0.17:8443/health": dial tcp 10.131.0.17:8443: connect: connection refused 

$ oc get po -l app=kube-apiserver -n ocm-production-$CLUSTER_ID
NAME                              READY   STATUS    RESTARTS   AGE
kube-apiserver-68676579bb-5fzvr   5/5     Running   0          98m
kube-apiserver-68676579bb-zjk5d   5/5     Running   0          100m

$ oc get svc kube-apiserver -n ocm-production-$CLUSTER_ID
NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kube-apiserver   ClusterIP   172.30.235.19   <none>        6443/TCP   28h

Version-Release number of selected component (if applicable):

HCP version: 4.14.8
Hypershift operator on HCP's management cluster: quay.io/acm-d/rhtap-hypershift-operator:2f7c947a8490bd3627602c9b50bc62e84ced31c3

How reproducible:

Unclear

Steps to Reproduce:

Initiate upgrade of HCP

Actual results:

Many kube-apiserver requests failing

Expected results:

kube-apiserver remains mostly stable during upgrades

Additional info:

This relates to a hard ticket to which an SRE will be assigned shortly. See comments for details.

Assignee:: Cesar Wong

Reporter:: Anthony Byrne

Need Info From:: None

Contributors:: None

QA Contact:: Jie Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/02/12 8:47 PM

Updated:: 2025/07/23 5:36 PM

Resolved:: 2024/02/22 3:36 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates