Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17120

[release-4.13] 4.13.z FIPS build - HyperShift - pods crashlooping with API connection failures


    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • None
    • 4.13.z
    • None
    • Critical
    • No
    • SDN Sprint 240
    • 1
    • Proposed
    • False
    • Hide



      Description of problem:

      On 4.13.0-0.nightly-2023-07-26-101700 (which is the first 4.13.z nightly with FIPS wrapper enabled) a HyperShift managed cluster shows many pods CrashLooping with errors trying to reach the API server.  
      NAMESPACE                                          NAME                                                                        READY   STATUS             RESTARTS         AGE
      clusters-hypershift-ci-3983                        capi-provider-548db9947d-m8cdh                                              1/2     CrashLoopBackOff   21 (4m36s ago)   74m
      clusters-hypershift-ci-3983                        catalog-operator-64568cd87c-tsrdz                                           1/2     CrashLoopBackOff   23 (84s ago)     71m
      clusters-hypershift-ci-3983                        cluster-node-tuning-operator-6f64d8b868-v9gsx                               0/1     Error              10 (73s ago)     71m
      clusters-hypershift-ci-3983                        control-plane-operator-dfbb565d4-zth66                                      1/2     CrashLoopBackOff   16 (23s ago)     74m
      clusters-hypershift-ci-3983                        hosted-cluster-config-operator-645f485699-j9dlj                             0/1     CrashLoopBackOff   13 (74s ago)     71m
      clusters-hypershift-ci-3983                        kube-controller-manager-7bdb568cdf-5wzmm                                    1/2     Error              4 (55s ago)      16m
      clusters-hypershift-ci-3983                        kube-controller-manager-7bdb568cdf-qbrcz                                    1/2     CrashLoopBackOff   7 (45s ago)      13m
      clusters-hypershift-ci-3983                        oauth-openshift-679f564bcd-k6vvj                                            1/2     CrashLoopBackOff   8 (106s ago)     60m
      clusters-hypershift-ci-3983                        olm-operator-868797cf6b-wgd6w                                               1/2     CrashLoopBackOff   20 (2m23s ago)   71m
      clusters-hypershift-ci-3983                        openshift-apiserver-7f67974fb-c4s7s                                         2/3     CrashLoopBackOff   4 (17s ago)      16m
      clusters-hypershift-ci-3983                        openshift-oauth-apiserver-76d796c4c6-bdrxq                                  1/2     CrashLoopBackOff   20 (2m19s ago)   71m
      clusters-hypershift-ci-3983                        openshift-oauth-apiserver-76d796c4c6-nmfbd                                  1/2     CrashLoopBackOff   18 (3m46s ago)   71m
      clusters-hypershift-ci-3983                        packageserver-6cbc459575-c8klz                                              1/2     CrashLoopBackOff   15 (73s ago)     71m
      hypershift                                         operator-54f77b9766-dln4p                                                   0/1     Error              9 (68s ago)      75m
      openshift-console                                  downloads-75f7f4c67c-j9zvf                                                  0/1     CrashLoopBackOff   23 (2m14s ago)   90m
      openshift-image-registry                           image-registry-58d8dc4f-bj4mg                                               0/1     CrashLoopBackOff   15 (85s ago)     89m
      openshift-ingress                                  router-default-7999c48dd6-kvnr6                                             0/1     CrashLoopBackOff   18 (95s ago)     75m
      openshift-monitoring                               prometheus-adapter-5d77d8cfb5-52dpq                                         0/1     CrashLoopBackOff   20 (2m17s ago)   91m
      openshift-ovn-kubernetes/pods/ovnkube-node-ng767/kube-rbac-proxy-ovn-metrics/kube-rbac-proxy-ovn-metrics/logs/current.log:2023-07-26T19:54:50.636915874Z E0726 19:54:50.634277    3054 auth.go:47] Unable to authenticate the request due to an error: Post "": context deadline exceeded
      openshift-cloud-network-config-controller/pods/cloud-network-config-controller-6d9d577f59-98dxh/controller/controller/logs/current.log:2023-07-26T17:50:07.380904032Z E0726 17:50:07.380510       1 leaderelection.go:330] error retrieving resource lock openshift-cloud-network-config-controller/cloud-network-config-controller-lock: Get "https://api-int.zhozhanghyp4131b.qe.devcluster.openshift.com:6443/api/v1/namespaces/openshift-cloud-network-config-controller/configmaps/cloud-network-config-controller-lock": dial tcp: lookup api-int.zhozhanghyp4131b.qe.devcluster.openshift.com on read udp> read: connection refused
      openshift-ingress/pods/router-default-7999c48dd6-kvnr6/router/router/logs/current.log:2023-07-26T19:52:56.368183176Z E0726 19:52:56.368172       1 reflector.go:140] github.com/openshift/router/pkg/router/template/service_lookup.go:33: Failed to watch *v1.Service: failed to list *v1.Service: Get "": dial tcp i/o timeout
      kube-controller-manager log:
      W0726 22:07:54.887716       1 feature_gate.go:227] unrecognized feature gate: OpenShiftPodSecurityAdmission
      E0726 22:08:25.163085       1 run.go:74] "command failed" err="unable to load configmap based request-header-client-ca-file: Get \"https://kube-apiserver:6443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication\": dial tcp: lookup kube-apiserver: i/o timeout"
       oc adm inspect and oc adm must-gather location in the comments.


      Version-Release number of selected component (if applicable):


      How reproducible:


      Steps to Reproduce:

      1.  Install HyperShift managed cluster with 1 guest cluster (will put the QE Flexy install job in the comments)
      2.  Successful install completion

      Actual results

      May pods crashlooping with messages about being unable to reach the API.  Clusteroperators going into and out of error states.  Commands failing with various errors trying to list resourcetypes.

      Expected results:


      Additional info:


            jcaamano@redhat.com Jaime Caamaño Ruiz
            mifiedle@redhat.com Mike Fiedler
            Anurag Saxena Anurag Saxena
            0 Vote for this issue
            5 Start watching this issue