-
Bug
-
Resolution: Done
-
Major
-
4.14
-
Important
-
No
-
Hypershift Sprint 236
-
1
-
Proposed
-
False
-
Description of problem:
APIServer endpoint isn't healthy after a PublicAndPrivate cluster is created. PROGRESS of the cluster is Completed and PROCESS is false, Nodes are ready, cluster operators on the guest cluster are Available, only issue is condition Type Available is False due to APIServer endpoint is not healthy. jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE jz-test 4.14.0-0.nightly-2023-04-30-235516 jz-test-admin-kubeconfig Completed False False APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com is not healthy jiezhao-mac:hypershift jiezhao$ oc get hostedcluster/jz-test -n clusters -ojsonpath='{.spec.platform.aws.endpointAccess}{"\n"}' PublicAndPrivate jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jz-test NAME READY STATUS RESTARTS AGE aws-cloud-controller-manager-666559d4f-rdsw4 2/2 Running 0 149m aws-ebs-csi-driver-controller-79fdfb6c76-vb7wr 7/7 Running 0 148m aws-ebs-csi-driver-operator-7dbd789984-mb9rp 1/1 Running 0 148m capi-provider-5b7847db9-nlrvz 2/2 Running 0 151m catalog-operator-7ccb468d86-7c5j6 2/2 Running 0 149m certified-operators-catalog-895787778-5rjb6 1/1 Running 0 149m cloud-network-config-controller-86698fd7dd-kgzhv 3/3 Running 0 148m cluster-api-6fd4f86878-hjw59 1/1 Running 0 151m cluster-autoscaler-bdd688949-f9xmk 1/1 Running 0 150m cluster-image-registry-operator-6f5cb67d88-8svd6 3/3 Running 0 149m cluster-network-operator-7bc69f75f4-npjfs 1/1 Running 0 149m cluster-node-tuning-operator-5855b6576b-rckhh 1/1 Running 0 149m cluster-policy-controller-56d4d6b57c-glx4w 1/1 Running 0 149m cluster-storage-operator-7cc56c68bb-jd4d2 1/1 Running 0 149m cluster-version-operator-bd969b677-bh4w4 1/1 Running 0 149m community-operators-catalog-5c545484d7-hbzb4 1/1 Running 0 149m control-plane-operator-fc49dcbb4-5ncvf 2/2 Running 0 151m csi-snapshot-controller-85f7cc9945-n5vgq 1/1 Running 0 149m csi-snapshot-controller-operator-6597b45897-hqf5p 1/1 Running 0 149m csi-snapshot-webhook-644d765546-lk9hj 1/1 Running 0 149m dns-operator-5b5577d6c7-8dh8d 1/1 Running 0 149m etcd-0 2/2 Running 0 150m hosted-cluster-config-operator-5b75ccf55d-6rzch 1/1 Running 0 149m ignition-server-596fc9d9fb-sb94h 1/1 Running 0 150m ingress-operator-6497d476bc-whssz 3/3 Running 0 149m konnectivity-agent-6656d8dfd6-h5tcs 1/1 Running 0 150m konnectivity-server-5ff9d4b47-stb2m 1/1 Running 0 150m kube-apiserver-596fc4bb8b-7kfd8 3/3 Running 0 150m kube-controller-manager-6f86bb7fbd-4wtxk 1/1 Running 0 138m kube-scheduler-bf5876b4b-flk96 1/1 Running 0 149m machine-approver-574585d8dd-h5ffh 1/1 Running 0 150m multus-admission-controller-67b6f85fbf-bfg4x 2/2 Running 0 148m oauth-openshift-6b6bfd55fb-8sdq7 2/2 Running 0 148m olm-operator-5d97fb977c-sbf6w 2/2 Running 0 149m openshift-apiserver-5bb9f99974-2lfp4 3/3 Running 0 138m openshift-controller-manager-65666bdf79-g8cf5 1/1 Running 0 149m openshift-oauth-apiserver-56c8565bb6-6b5cv 2/2 Running 0 149m openshift-route-controller-manager-775f844dfc-jj2ft 1/1 Running 0 149m ovnkube-master-0 7/7 Running 0 148m packageserver-6587d9674b-6jwpv 2/2 Running 0 149m redhat-marketplace-catalog-5f6d45b457-hdn77 1/1 Running 0 149m redhat-operators-catalog-7958c4449b-l4hbx 1/1 Running 0 12m router-5b7899cc97-chs6t 1/1 Running 0 150m jiezhao-mac:hypershift jiezhao$ oc get node --kubeconfig=hostedcluster.kubeconfig NAME STATUS ROLES AGE VERSION ip-10-0-137-99.us-east-2.compute.internal Ready worker 131m v1.26.2+d2e245f ip-10-0-140-85.us-east-2.compute.internal Ready worker 132m v1.26.2+d2e245f ip-10-0-141-46.us-east-2.compute.internal Ready worker 131m v1.26.2+d2e245f jiezhao-mac:hypershift jiezhao$ jiezhao-mac:hypershift jiezhao$ oc get co --kubeconfig=hostedcluster.kubeconfig NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console 4.14.0-0.nightly-2023-04-30-235516 True False False 126m csi-snapshot-controller 4.14.0-0.nightly-2023-04-30-235516 True False False 140m dns 4.14.0-0.nightly-2023-04-30-235516 True False False 129m image-registry 4.14.0-0.nightly-2023-04-30-235516 True False False 128m ingress 4.14.0-0.nightly-2023-04-30-235516 True False False 129m insights 4.14.0-0.nightly-2023-04-30-235516 True False False 130m kube-apiserver 4.14.0-0.nightly-2023-04-30-235516 True False False 140m kube-controller-manager 4.14.0-0.nightly-2023-04-30-235516 True False False 140m kube-scheduler 4.14.0-0.nightly-2023-04-30-235516 True False False 140m kube-storage-version-migrator 4.14.0-0.nightly-2023-04-30-235516 True False False 129m monitoring 4.14.0-0.nightly-2023-04-30-235516 True False False 129m network 4.14.0-0.nightly-2023-04-30-235516 True False False 140m node-tuning 4.14.0-0.nightly-2023-04-30-235516 True False False 131m openshift-apiserver 4.14.0-0.nightly-2023-04-30-235516 True False False 140m openshift-controller-manager 4.14.0-0.nightly-2023-04-30-235516 True False False 140m openshift-samples 4.14.0-0.nightly-2023-04-30-235516 True False False 129m operator-lifecycle-manager 4.14.0-0.nightly-2023-04-30-235516 True False False 140m operator-lifecycle-manager-catalog 4.14.0-0.nightly-2023-04-30-235516 True False False 140m operator-lifecycle-manager-packageserver 4.14.0-0.nightly-2023-04-30-235516 True False False 140m service-ca 4.14.0-0.nightly-2023-04-30-235516 True False False 130m storage 4.14.0-0.nightly-2023-04-30-235516 True False False 131m jiezhao-mac:hypershift jiezhao$ HC conditions: ============== status: conditions: - lastTransitionTime: "2023-05-01T19:45:49Z" message: All is well observedGeneration: 3 reason: AsExpected status: "True" type: ValidAWSIdentityProvider - lastTransitionTime: "2023-05-01T20:00:18Z" message: Cluster version is 4.14.0-0.nightly-2023-04-30-235516 observedGeneration: 3 reason: FromClusterVersion status: "False" type: ClusterVersionProgressing - lastTransitionTime: "2023-05-01T19:46:22Z" message: Payload loaded version="4.14.0-0.nightly-2023-04-30-235516" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-04-30-235516" architecture="amd64" observedGeneration: 3 reason: PayloadLoaded status: "True" type: ClusterVersionReleaseAccepted - lastTransitionTime: "2023-05-01T20:03:14Z" message: Condition not found in the CVO. observedGeneration: 3 reason: StatusUnknown status: Unknown type: ClusterVersionUpgradeable - lastTransitionTime: "2023-05-01T20:00:18Z" message: Done applying 4.14.0-0.nightly-2023-04-30-235516 observedGeneration: 3 reason: FromClusterVersion status: "True" type: ClusterVersionAvailable - lastTransitionTime: "2023-05-01T20:00:18Z" message: "" observedGeneration: 3 reason: FromClusterVersion status: "True" type: ClusterVersionSucceeding - lastTransitionTime: "2023-05-01T19:47:51Z" message: The hosted cluster is not degraded observedGeneration: 3 reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2023-05-01T19:45:01Z" message: "" observedGeneration: 3 reason: QuorumAvailable status: "True" type: EtcdAvailable - lastTransitionTime: "2023-05-01T19:45:38Z" message: Kube APIServer deployment is available observedGeneration: 3 reason: AsExpected status: "True" type: KubeAPIServerAvailable - lastTransitionTime: "2023-05-01T19:44:27Z" message: All is well observedGeneration: 3 reason: AsExpected status: "True" type: InfrastructureReady - lastTransitionTime: "2023-05-01T19:44:11Z" message: External DNS is not configured observedGeneration: 3 reason: StatusUnknown status: Unknown type: ExternalDNSReachable - lastTransitionTime: "2023-05-01T19:44:19Z" message: Configuration passes validation observedGeneration: 3 reason: AsExpected status: "True" type: ValidHostedControlPlaneConfiguration - lastTransitionTime: "2023-05-01T19:44:11Z" message: AWS KMS is not configured observedGeneration: 3 reason: StatusUnknown status: Unknown type: ValidAWSKMSConfig - lastTransitionTime: "2023-05-01T19:44:37Z" message: All is well observedGeneration: 3 reason: AsExpected status: "True" type: ValidReleaseInfo - lastTransitionTime: "2023-05-01T19:44:11Z" message: APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com is not healthy observedGeneration: 3 reason: waitingForAvailable status: "False" type: Available - lastTransitionTime: "2023-05-01T19:47:18Z" message: All is well reason: AWSSuccess status: "True" type: AWSEndpointAvailable - lastTransitionTime: "2023-05-01T19:47:18Z" message: All is well reason: AWSSuccess status: "True" type: AWSEndpointServiceAvailable - lastTransitionTime: "2023-05-01T19:44:11Z" message: Configuration passes validation observedGeneration: 3 reason: AsExpected status: "True" type: ValidConfiguration - lastTransitionTime: "2023-05-01T19:44:11Z" message: HostedCluster is supported by operator configuration observedGeneration: 3 reason: AsExpected status: "True" type: SupportedHostedCluster - lastTransitionTime: "2023-05-01T19:45:39Z" message: Ignition server deployment is available observedGeneration: 3 reason: AsExpected status: "True" type: IgnitionEndpointAvailable - lastTransitionTime: "2023-05-01T19:44:11Z" message: Reconciliation active on resource observedGeneration: 3 reason: AsExpected status: "True" type: ReconciliationActive - lastTransitionTime: "2023-05-01T19:44:12Z" message: Release image is valid observedGeneration: 3 reason: AsExpected status: "True" type: ValidReleaseImage - lastTransitionTime: "2023-05-01T19:44:12Z" message: HostedCluster is at expected version observedGeneration: 3 reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2023-05-01T19:44:13Z" message: OIDC configuration is valid observedGeneration: 3 reason: AsExpected status: "True" type: ValidOIDCConfiguration - lastTransitionTime: "2023-05-01T19:44:13Z" message: Reconciliation completed succesfully observedGeneration: 3 reason: ReconciliatonSucceeded status: "True" type: ReconciliationSucceeded - lastTransitionTime: "2023-05-01T19:45:52Z" message: All is well observedGeneration: 3 reason: AsExpected status: "True" type: AWSDefaultSecurityGroupCreated kube-apiserver log: ================== E0501 19:45:07.024278 7 memcache.go:238] couldn't get current server API group list: Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_03_authorization-openshift_01_rolebindingrestriction.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_03_config-operator_01_proxy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_03_quota-openshift_01_clusterresourcequota.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_03_security-openshift_01_scc.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_03_securityinternal-openshift_02_rangeallocation.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_apiserver-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_authentication.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_build.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_console.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_dns.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_featuregate.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_image.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_imagecontentpolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_imagecontentsourcepolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_imagedigestmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_imagetagmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_infrastructure-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_ingress.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_network.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_node.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_oauth.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_project.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused unable to recognize "/work/0000_10_config-operator_01_scheduler.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
Version-Release number of selected component (if applicable):
How reproducible:
always
Steps to Reproduce:
1. Create a PublicAndPrivate cluster
Actual results:
APIServer endpoint is not healthy, and HC condition Type 'Available' is False
Expected results:
APIServer endpoint should be healthy, and Type 'Available' should be True
Additional info:
- blocks
-
OCPBUGS-13497 kube-apiserver isn't healthy after a cluster comes up
- Closed
- is cloned by
-
OCPBUGS-13497 kube-apiserver isn't healthy after a cluster comes up
- Closed
- links to
- mentioned on