Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29379

HostedCluster's kube-apiserver disrupted during HCP upgrade

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • No
    • None
    • None
    • None
    • Hypershift Sprint 250
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Several platform components (e.g., console) and user workloads running on a HostedCluster became unstable during their HCP's upgrade to 4.14.8. Despite being reported as Ready with 0 restarts, the kube-apiserver pods seem to become unreachable at some points, e.g.,
      
      $ oc describe pod console-68d48f57d7-kxjxw -n openshift-console
        Warning  Unhealthy       61m (x10 over 62m)     kubelet            Readiness probe failed: Get "https://10.131.0.17:8443/health": dial tcp 10.131.0.17:8443: connect: connection refused
        Warning  BackOff         7m52s (x75 over 36m)   kubelet            Back-off restarting failed container console in pod console-68d48f57d7-kxjxw_openshift-console(e8cf0eeb-dddc-4e9f-a107-3abe7b83647c)
        Warning  ProbeError      2m53s (x336 over 62m)  kubelet            Readiness probe error: Get "https://10.131.0.17:8443/health": dial tcp 10.131.0.17:8443: connect: connection refused 
      
      $ oc get po -l app=kube-apiserver -n ocm-production-$CLUSTER_ID
      NAME                              READY   STATUS    RESTARTS   AGE
      kube-apiserver-68676579bb-5fzvr   5/5     Running   0          98m
      kube-apiserver-68676579bb-zjk5d   5/5     Running   0          100m
      
      $ oc get svc kube-apiserver -n ocm-production-$CLUSTER_ID
      NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
      kube-apiserver   ClusterIP   172.30.235.19   <none>        6443/TCP   28h
          

      Version-Release number of selected component (if applicable):

      HCP version: 4.14.8
      Hypershift operator on HCP's management cluster: quay.io/acm-d/rhtap-hypershift-operator:2f7c947a8490bd3627602c9b50bc62e84ced31c3
      

      How reproducible:

      Unclear
      

      Steps to Reproduce:

      Initiate upgrade of HCP
      

      Actual results:

      Many kube-apiserver requests failing
      

      Expected results:

      kube-apiserver remains mostly stable during upgrades
      

      Additional info:

      This relates to a hard ticket to which an SRE will be assigned shortly. See comments for details.
      

              cewong@redhat.com Cesar Wong
              abyrne.openshift Anthony Byrne
              None
              None
              Jie Zhao Jie Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: