-
Bug
-
Resolution: Done
-
Major
-
4.14
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
Proposed
-
Hypershift Sprint 236
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
APIServer endpoint isn't healthy after a PublicAndPrivate cluster is created. PROGRESS of the cluster is Completed and PROCESS is false, Nodes are ready, cluster operators on the guest cluster are Available, only issue is condition Type Available is False due to APIServer endpoint is not healthy.
jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters
NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE
jz-test 4.14.0-0.nightly-2023-04-30-235516 jz-test-admin-kubeconfig Completed False False APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com is not healthy
jiezhao-mac:hypershift jiezhao$ oc get hostedcluster/jz-test -n clusters -ojsonpath='{.spec.platform.aws.endpointAccess}{"\n"}'
PublicAndPrivate
jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jz-test
NAME READY STATUS RESTARTS AGE
aws-cloud-controller-manager-666559d4f-rdsw4 2/2 Running 0 149m
aws-ebs-csi-driver-controller-79fdfb6c76-vb7wr 7/7 Running 0 148m
aws-ebs-csi-driver-operator-7dbd789984-mb9rp 1/1 Running 0 148m
capi-provider-5b7847db9-nlrvz 2/2 Running 0 151m
catalog-operator-7ccb468d86-7c5j6 2/2 Running 0 149m
certified-operators-catalog-895787778-5rjb6 1/1 Running 0 149m
cloud-network-config-controller-86698fd7dd-kgzhv 3/3 Running 0 148m
cluster-api-6fd4f86878-hjw59 1/1 Running 0 151m
cluster-autoscaler-bdd688949-f9xmk 1/1 Running 0 150m
cluster-image-registry-operator-6f5cb67d88-8svd6 3/3 Running 0 149m
cluster-network-operator-7bc69f75f4-npjfs 1/1 Running 0 149m
cluster-node-tuning-operator-5855b6576b-rckhh 1/1 Running 0 149m
cluster-policy-controller-56d4d6b57c-glx4w 1/1 Running 0 149m
cluster-storage-operator-7cc56c68bb-jd4d2 1/1 Running 0 149m
cluster-version-operator-bd969b677-bh4w4 1/1 Running 0 149m
community-operators-catalog-5c545484d7-hbzb4 1/1 Running 0 149m
control-plane-operator-fc49dcbb4-5ncvf 2/2 Running 0 151m
csi-snapshot-controller-85f7cc9945-n5vgq 1/1 Running 0 149m
csi-snapshot-controller-operator-6597b45897-hqf5p 1/1 Running 0 149m
csi-snapshot-webhook-644d765546-lk9hj 1/1 Running 0 149m
dns-operator-5b5577d6c7-8dh8d 1/1 Running 0 149m
etcd-0 2/2 Running 0 150m
hosted-cluster-config-operator-5b75ccf55d-6rzch 1/1 Running 0 149m
ignition-server-596fc9d9fb-sb94h 1/1 Running 0 150m
ingress-operator-6497d476bc-whssz 3/3 Running 0 149m
konnectivity-agent-6656d8dfd6-h5tcs 1/1 Running 0 150m
konnectivity-server-5ff9d4b47-stb2m 1/1 Running 0 150m
kube-apiserver-596fc4bb8b-7kfd8 3/3 Running 0 150m
kube-controller-manager-6f86bb7fbd-4wtxk 1/1 Running 0 138m
kube-scheduler-bf5876b4b-flk96 1/1 Running 0 149m
machine-approver-574585d8dd-h5ffh 1/1 Running 0 150m
multus-admission-controller-67b6f85fbf-bfg4x 2/2 Running 0 148m
oauth-openshift-6b6bfd55fb-8sdq7 2/2 Running 0 148m
olm-operator-5d97fb977c-sbf6w 2/2 Running 0 149m
openshift-apiserver-5bb9f99974-2lfp4 3/3 Running 0 138m
openshift-controller-manager-65666bdf79-g8cf5 1/1 Running 0 149m
openshift-oauth-apiserver-56c8565bb6-6b5cv 2/2 Running 0 149m
openshift-route-controller-manager-775f844dfc-jj2ft 1/1 Running 0 149m
ovnkube-master-0 7/7 Running 0 148m
packageserver-6587d9674b-6jwpv 2/2 Running 0 149m
redhat-marketplace-catalog-5f6d45b457-hdn77 1/1 Running 0 149m
redhat-operators-catalog-7958c4449b-l4hbx 1/1 Running 0 12m
router-5b7899cc97-chs6t 1/1 Running 0 150m
jiezhao-mac:hypershift jiezhao$ oc get node --kubeconfig=hostedcluster.kubeconfig
NAME STATUS ROLES AGE VERSION
ip-10-0-137-99.us-east-2.compute.internal Ready worker 131m v1.26.2+d2e245f
ip-10-0-140-85.us-east-2.compute.internal Ready worker 132m v1.26.2+d2e245f
ip-10-0-141-46.us-east-2.compute.internal Ready worker 131m v1.26.2+d2e245f
jiezhao-mac:hypershift jiezhao$
jiezhao-mac:hypershift jiezhao$ oc get co --kubeconfig=hostedcluster.kubeconfig
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
console 4.14.0-0.nightly-2023-04-30-235516 True False False 126m
csi-snapshot-controller 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
dns 4.14.0-0.nightly-2023-04-30-235516 True False False 129m
image-registry 4.14.0-0.nightly-2023-04-30-235516 True False False 128m
ingress 4.14.0-0.nightly-2023-04-30-235516 True False False 129m
insights 4.14.0-0.nightly-2023-04-30-235516 True False False 130m
kube-apiserver 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
kube-controller-manager 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
kube-scheduler 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
kube-storage-version-migrator 4.14.0-0.nightly-2023-04-30-235516 True False False 129m
monitoring 4.14.0-0.nightly-2023-04-30-235516 True False False 129m
network 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
node-tuning 4.14.0-0.nightly-2023-04-30-235516 True False False 131m
openshift-apiserver 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
openshift-controller-manager 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
openshift-samples 4.14.0-0.nightly-2023-04-30-235516 True False False 129m
operator-lifecycle-manager 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
operator-lifecycle-manager-catalog 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
operator-lifecycle-manager-packageserver 4.14.0-0.nightly-2023-04-30-235516 True False False 140m
service-ca 4.14.0-0.nightly-2023-04-30-235516 True False False 130m
storage 4.14.0-0.nightly-2023-04-30-235516 True False False 131m
jiezhao-mac:hypershift jiezhao$
HC conditions:
==============
status:
conditions:
- lastTransitionTime: "2023-05-01T19:45:49Z"
message: All is well
observedGeneration: 3
reason: AsExpected
status: "True"
type: ValidAWSIdentityProvider
- lastTransitionTime: "2023-05-01T20:00:18Z"
message: Cluster version is 4.14.0-0.nightly-2023-04-30-235516
observedGeneration: 3
reason: FromClusterVersion
status: "False"
type: ClusterVersionProgressing
- lastTransitionTime: "2023-05-01T19:46:22Z"
message: Payload loaded version="4.14.0-0.nightly-2023-04-30-235516" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-04-30-235516"
architecture="amd64"
observedGeneration: 3
reason: PayloadLoaded
status: "True"
type: ClusterVersionReleaseAccepted
- lastTransitionTime: "2023-05-01T20:03:14Z"
message: Condition not found in the CVO.
observedGeneration: 3
reason: StatusUnknown
status: Unknown
type: ClusterVersionUpgradeable
- lastTransitionTime: "2023-05-01T20:00:18Z"
message: Done applying 4.14.0-0.nightly-2023-04-30-235516
observedGeneration: 3
reason: FromClusterVersion
status: "True"
type: ClusterVersionAvailable
- lastTransitionTime: "2023-05-01T20:00:18Z"
message: ""
observedGeneration: 3
reason: FromClusterVersion
status: "True"
type: ClusterVersionSucceeding
- lastTransitionTime: "2023-05-01T19:47:51Z"
message: The hosted cluster is not degraded
observedGeneration: 3
reason: AsExpected
status: "False"
type: Degraded
- lastTransitionTime: "2023-05-01T19:45:01Z"
message: ""
observedGeneration: 3
reason: QuorumAvailable
status: "True"
type: EtcdAvailable
- lastTransitionTime: "2023-05-01T19:45:38Z"
message: Kube APIServer deployment is available
observedGeneration: 3
reason: AsExpected
status: "True"
type: KubeAPIServerAvailable
- lastTransitionTime: "2023-05-01T19:44:27Z"
message: All is well
observedGeneration: 3
reason: AsExpected
status: "True"
type: InfrastructureReady
- lastTransitionTime: "2023-05-01T19:44:11Z"
message: External DNS is not configured
observedGeneration: 3
reason: StatusUnknown
status: Unknown
type: ExternalDNSReachable
- lastTransitionTime: "2023-05-01T19:44:19Z"
message: Configuration passes validation
observedGeneration: 3
reason: AsExpected
status: "True"
type: ValidHostedControlPlaneConfiguration
- lastTransitionTime: "2023-05-01T19:44:11Z"
message: AWS KMS is not configured
observedGeneration: 3
reason: StatusUnknown
status: Unknown
type: ValidAWSKMSConfig
- lastTransitionTime: "2023-05-01T19:44:37Z"
message: All is well
observedGeneration: 3
reason: AsExpected
status: "True"
type: ValidReleaseInfo
- lastTransitionTime: "2023-05-01T19:44:11Z"
message: APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com
is not healthy
observedGeneration: 3
reason: waitingForAvailable
status: "False"
type: Available
- lastTransitionTime: "2023-05-01T19:47:18Z"
message: All is well
reason: AWSSuccess
status: "True"
type: AWSEndpointAvailable
- lastTransitionTime: "2023-05-01T19:47:18Z"
message: All is well
reason: AWSSuccess
status: "True"
type: AWSEndpointServiceAvailable
- lastTransitionTime: "2023-05-01T19:44:11Z"
message: Configuration passes validation
observedGeneration: 3
reason: AsExpected
status: "True"
type: ValidConfiguration
- lastTransitionTime: "2023-05-01T19:44:11Z"
message: HostedCluster is supported by operator configuration
observedGeneration: 3
reason: AsExpected
status: "True"
type: SupportedHostedCluster
- lastTransitionTime: "2023-05-01T19:45:39Z"
message: Ignition server deployment is available
observedGeneration: 3
reason: AsExpected
status: "True"
type: IgnitionEndpointAvailable
- lastTransitionTime: "2023-05-01T19:44:11Z"
message: Reconciliation active on resource
observedGeneration: 3
reason: AsExpected
status: "True"
type: ReconciliationActive
- lastTransitionTime: "2023-05-01T19:44:12Z"
message: Release image is valid
observedGeneration: 3
reason: AsExpected
status: "True"
type: ValidReleaseImage
- lastTransitionTime: "2023-05-01T19:44:12Z"
message: HostedCluster is at expected version
observedGeneration: 3
reason: AsExpected
status: "False"
type: Progressing
- lastTransitionTime: "2023-05-01T19:44:13Z"
message: OIDC configuration is valid
observedGeneration: 3
reason: AsExpected
status: "True"
type: ValidOIDCConfiguration
- lastTransitionTime: "2023-05-01T19:44:13Z"
message: Reconciliation completed succesfully
observedGeneration: 3
reason: ReconciliatonSucceeded
status: "True"
type: ReconciliationSucceeded
- lastTransitionTime: "2023-05-01T19:45:52Z"
message: All is well
observedGeneration: 3
reason: AsExpected
status: "True"
type: AWSDefaultSecurityGroupCreated
kube-apiserver log:
==================
E0501 19:45:07.024278 7 memcache.go:238] couldn't get current server API group list: Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_authorization-openshift_01_rolebindingrestriction.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_config-operator_01_proxy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_quota-openshift_01_clusterresourcequota.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_security-openshift_01_scc.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_securityinternal-openshift_02_rangeallocation.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_apiserver-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_authentication.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_build.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_console.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_dns.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_featuregate.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_image.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagecontentpolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagecontentsourcepolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagedigestmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagetagmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_infrastructure-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_ingress.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_network.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_node.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_oauth.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_project.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_scheduler.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
Version-Release number of selected component (if applicable):
How reproducible:
always
Steps to Reproduce:
1. Create a PublicAndPrivate cluster
Actual results:
APIServer endpoint is not healthy, and HC condition Type 'Available' is False
Expected results:
APIServer endpoint should be healthy, and Type 'Available' should be True
Additional info:
- blocks
-
OCPBUGS-13497 kube-apiserver isn't healthy after a cluster comes up
-
- Closed
-
- is cloned by
-
OCPBUGS-13497 kube-apiserver isn't healthy after a cluster comes up
-
- Closed
-
- links to
- mentioned on