Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-13497

kube-apiserver isn't healthy after a cluster comes up

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • 4.14
    • HyperShift
    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-13021. The following is the description of the original issue:

      Description of problem:

      APIServer endpoint isn't healthy after a PublicAndPrivate cluster is created. PROGRESS  of the cluster is Completed and PROCESS is false, Nodes are ready, cluster operators on the guest cluster are Available, only issue is condition Type Available is False due to APIServer endpoint is not healthy.
      
      jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters
      NAME   VERSION               KUBECONFIG         PROGRESS  AVAILABLE  PROGRESSING  MESSAGE
      jz-test  4.14.0-0.nightly-2023-04-30-235516  jz-test-admin-kubeconfig  Completed  False    False     APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com is not healthy
      
      jiezhao-mac:hypershift jiezhao$ oc get hostedcluster/jz-test -n clusters -ojsonpath='{.spec.platform.aws.endpointAccess}{"\n"}'
      PublicAndPrivate
      
      jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jz-test
      NAME                                                  READY   STATUS    RESTARTS   AGE
      aws-cloud-controller-manager-666559d4f-rdsw4          2/2     Running   0          149m
      aws-ebs-csi-driver-controller-79fdfb6c76-vb7wr        7/7     Running   0          148m
      aws-ebs-csi-driver-operator-7dbd789984-mb9rp          1/1     Running   0          148m
      capi-provider-5b7847db9-nlrvz                         2/2     Running   0          151m
      catalog-operator-7ccb468d86-7c5j6                     2/2     Running   0          149m
      certified-operators-catalog-895787778-5rjb6           1/1     Running   0          149m
      cloud-network-config-controller-86698fd7dd-kgzhv      3/3     Running   0          148m
      cluster-api-6fd4f86878-hjw59                          1/1     Running   0          151m
      cluster-autoscaler-bdd688949-f9xmk                    1/1     Running   0          150m
      cluster-image-registry-operator-6f5cb67d88-8svd6      3/3     Running   0          149m
      cluster-network-operator-7bc69f75f4-npjfs             1/1     Running   0          149m
      cluster-node-tuning-operator-5855b6576b-rckhh         1/1     Running   0          149m
      cluster-policy-controller-56d4d6b57c-glx4w            1/1     Running   0          149m
      cluster-storage-operator-7cc56c68bb-jd4d2             1/1     Running   0          149m
      cluster-version-operator-bd969b677-bh4w4              1/1     Running   0          149m
      community-operators-catalog-5c545484d7-hbzb4          1/1     Running   0          149m
      control-plane-operator-fc49dcbb4-5ncvf                2/2     Running   0          151m
      csi-snapshot-controller-85f7cc9945-n5vgq              1/1     Running   0          149m
      csi-snapshot-controller-operator-6597b45897-hqf5p     1/1     Running   0          149m
      csi-snapshot-webhook-644d765546-lk9hj                 1/1     Running   0          149m
      dns-operator-5b5577d6c7-8dh8d                         1/1     Running   0          149m
      etcd-0                                                2/2     Running   0          150m
      hosted-cluster-config-operator-5b75ccf55d-6rzch       1/1     Running   0          149m
      ignition-server-596fc9d9fb-sb94h                      1/1     Running   0          150m
      ingress-operator-6497d476bc-whssz                     3/3     Running   0          149m
      konnectivity-agent-6656d8dfd6-h5tcs                   1/1     Running   0          150m
      konnectivity-server-5ff9d4b47-stb2m                   1/1     Running   0          150m
      kube-apiserver-596fc4bb8b-7kfd8                       3/3     Running   0          150m
      kube-controller-manager-6f86bb7fbd-4wtxk              1/1     Running   0          138m
      kube-scheduler-bf5876b4b-flk96                        1/1     Running   0          149m
      machine-approver-574585d8dd-h5ffh                     1/1     Running   0          150m
      multus-admission-controller-67b6f85fbf-bfg4x          2/2     Running   0          148m
      oauth-openshift-6b6bfd55fb-8sdq7                      2/2     Running   0          148m
      olm-operator-5d97fb977c-sbf6w                         2/2     Running   0          149m
      openshift-apiserver-5bb9f99974-2lfp4                  3/3     Running   0          138m
      openshift-controller-manager-65666bdf79-g8cf5         1/1     Running   0          149m
      openshift-oauth-apiserver-56c8565bb6-6b5cv            2/2     Running   0          149m
      openshift-route-controller-manager-775f844dfc-jj2ft   1/1     Running   0          149m
      ovnkube-master-0                                      7/7     Running   0          148m
      packageserver-6587d9674b-6jwpv                        2/2     Running   0          149m
      redhat-marketplace-catalog-5f6d45b457-hdn77           1/1     Running   0          149m
      redhat-operators-catalog-7958c4449b-l4hbx             1/1     Running   0          12m
      router-5b7899cc97-chs6t                               1/1     Running   0          150m
      
      jiezhao-mac:hypershift jiezhao$ oc get node --kubeconfig=hostedcluster.kubeconfig 
      NAME                                        STATUS   ROLES    AGE    VERSION
      ip-10-0-137-99.us-east-2.compute.internal   Ready    worker   131m   v1.26.2+d2e245f
      ip-10-0-140-85.us-east-2.compute.internal   Ready    worker   132m   v1.26.2+d2e245f
      ip-10-0-141-46.us-east-2.compute.internal   Ready    worker   131m   v1.26.2+d2e245f
      jiezhao-mac:hypershift jiezhao$ 
      jiezhao-mac:hypershift jiezhao$ oc get co --kubeconfig=hostedcluster.kubeconfig 
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      console                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      126m    
      csi-snapshot-controller                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      dns                                        4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
      image-registry                             4.14.0-0.nightly-2023-04-30-235516   True        False         False      128m    
      ingress                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
      insights                                   4.14.0-0.nightly-2023-04-30-235516   True        False         False      130m    
      kube-apiserver                             4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      kube-controller-manager                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      kube-scheduler                             4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      kube-storage-version-migrator              4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
      monitoring                                 4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
      network                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      node-tuning                                4.14.0-0.nightly-2023-04-30-235516   True        False         False      131m    
      openshift-apiserver                        4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      openshift-controller-manager               4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      openshift-samples                          4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
      operator-lifecycle-manager                 4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      operator-lifecycle-manager-catalog         4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      operator-lifecycle-manager-packageserver   4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
      service-ca                                 4.14.0-0.nightly-2023-04-30-235516   True        False         False      130m    
      storage                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      131m    
      jiezhao-mac:hypershift jiezhao$ 
      
      HC conditions:
      ==============
        status:
          conditions:
          - lastTransitionTime: "2023-05-01T19:45:49Z"
            message: All is well
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: ValidAWSIdentityProvider
          - lastTransitionTime: "2023-05-01T20:00:18Z"
            message: Cluster version is 4.14.0-0.nightly-2023-04-30-235516
            observedGeneration: 3
            reason: FromClusterVersion
            status: "False"
            type: ClusterVersionProgressing
          - lastTransitionTime: "2023-05-01T19:46:22Z"
            message: Payload loaded version="4.14.0-0.nightly-2023-04-30-235516" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-04-30-235516"
              architecture="amd64"
            observedGeneration: 3
            reason: PayloadLoaded
            status: "True"
            type: ClusterVersionReleaseAccepted
          - lastTransitionTime: "2023-05-01T20:03:14Z"
            message: Condition not found in the CVO.
            observedGeneration: 3
            reason: StatusUnknown
            status: Unknown
            type: ClusterVersionUpgradeable
          - lastTransitionTime: "2023-05-01T20:00:18Z"
            message: Done applying 4.14.0-0.nightly-2023-04-30-235516
            observedGeneration: 3
            reason: FromClusterVersion
            status: "True"
            type: ClusterVersionAvailable
          - lastTransitionTime: "2023-05-01T20:00:18Z"
            message: ""
            observedGeneration: 3
            reason: FromClusterVersion
            status: "True"
            type: ClusterVersionSucceeding
          - lastTransitionTime: "2023-05-01T19:47:51Z"
            message: The hosted cluster is not degraded
            observedGeneration: 3
            reason: AsExpected
            status: "False"
            type: Degraded
          - lastTransitionTime: "2023-05-01T19:45:01Z"
            message: ""
            observedGeneration: 3
            reason: QuorumAvailable
            status: "True"
            type: EtcdAvailable
          - lastTransitionTime: "2023-05-01T19:45:38Z"
            message: Kube APIServer deployment is available
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: KubeAPIServerAvailable
          - lastTransitionTime: "2023-05-01T19:44:27Z"
            message: All is well
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: InfrastructureReady
          - lastTransitionTime: "2023-05-01T19:44:11Z"
            message: External DNS is not configured
            observedGeneration: 3
            reason: StatusUnknown
            status: Unknown
            type: ExternalDNSReachable
          - lastTransitionTime: "2023-05-01T19:44:19Z"
            message: Configuration passes validation
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: ValidHostedControlPlaneConfiguration
          - lastTransitionTime: "2023-05-01T19:44:11Z"
            message: AWS KMS is not configured
            observedGeneration: 3
            reason: StatusUnknown
            status: Unknown
            type: ValidAWSKMSConfig
          - lastTransitionTime: "2023-05-01T19:44:37Z"
            message: All is well
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: ValidReleaseInfo
          - lastTransitionTime: "2023-05-01T19:44:11Z"
            message: APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com
              is not healthy
            observedGeneration: 3
            reason: waitingForAvailable
            status: "False"
            type: Available
          - lastTransitionTime: "2023-05-01T19:47:18Z"
            message: All is well
            reason: AWSSuccess
            status: "True"
            type: AWSEndpointAvailable
          - lastTransitionTime: "2023-05-01T19:47:18Z"
            message: All is well
            reason: AWSSuccess
            status: "True"
            type: AWSEndpointServiceAvailable
          - lastTransitionTime: "2023-05-01T19:44:11Z"
            message: Configuration passes validation
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: ValidConfiguration
          - lastTransitionTime: "2023-05-01T19:44:11Z"
            message: HostedCluster is supported by operator configuration
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: SupportedHostedCluster
          - lastTransitionTime: "2023-05-01T19:45:39Z"
            message: Ignition server deployment is available
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: IgnitionEndpointAvailable
          - lastTransitionTime: "2023-05-01T19:44:11Z"
            message: Reconciliation active on resource
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: ReconciliationActive
          - lastTransitionTime: "2023-05-01T19:44:12Z"
            message: Release image is valid
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: ValidReleaseImage
          - lastTransitionTime: "2023-05-01T19:44:12Z"
            message: HostedCluster is at expected version
            observedGeneration: 3
            reason: AsExpected
            status: "False"
            type: Progressing
          - lastTransitionTime: "2023-05-01T19:44:13Z"
            message: OIDC configuration is valid
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: ValidOIDCConfiguration
          - lastTransitionTime: "2023-05-01T19:44:13Z"
            message: Reconciliation completed succesfully
            observedGeneration: 3
            reason: ReconciliatonSucceeded
            status: "True"
            type: ReconciliationSucceeded
          - lastTransitionTime: "2023-05-01T19:45:52Z"
            message: All is well
            observedGeneration: 3
            reason: AsExpected
            status: "True"
            type: AWSDefaultSecurityGroupCreated
      
      kube-apiserver log:
      ==================
      E0501 19:45:07.024278       7 memcache.go:238] couldn't get current server API group list: Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_03_authorization-openshift_01_rolebindingrestriction.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_03_config-operator_01_proxy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_03_quota-openshift_01_clusterresourcequota.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_03_security-openshift_01_scc.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_03_securityinternal-openshift_02_rangeallocation.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_apiserver-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_authentication.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_build.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_console.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_dns.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_featuregate.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_image.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_imagecontentpolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_imagecontentsourcepolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_imagedigestmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_imagetagmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_infrastructure-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_ingress.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_network.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_node.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_oauth.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_project.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
      unable to recognize "/work/0000_10_config-operator_01_scheduler.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      always

      Steps to Reproduce:

      1. Create a PublicAndPrivate cluster
      

      Actual results:

      APIServer endpoint is not healthy, and HC condition Type 'Available' is False

      Expected results:

      APIServer endpoint should be healthy, and Type 'Available' should be True

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              agarcial@redhat.com Alberto Garcia Lamela
              openshift-crt-jira-prow OpenShift Prow Bot
              He Liu He Liu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: