-
Bug
-
Resolution: Obsolete
-
Normal
-
None
-
None
-
None
-
False
-
False
-
Undefined
-
-
0
-
0
-
0
This bug is found during the test for HOSTEDCP-112
The testcase record: OCP-42855
0) use `hypershift create cluster` to create a hosted cluster. Check etcd status in the control plane of hosted cluster. Namespace is : clusters-{cluster-name}
1) when etcd pod is deleted, the condition status in hostedcontrolplane shows etcd status is still True. The expected value is False.
2) Confirmed with hypershift dev, the failure is caused by incorrect status in etcdcluster resource. The right place to fix this is in the etcd operator.
harry@liuhedeMacBook-Pro openshift % oc get pods -n clusters-example NAME READY STATUS RESTARTS AGE capa-controller-manager-7888cb46bd-s7lsd 1/1 Running 0 144m certified-operators-catalog-6d8b854bc4-lsslh 1/1 Running 0 141m cluster-api-5b84c5f55f-pntv2 1/1 Running 0 144m cluster-autoscaler-dc987fbf9-vcptb 0/1 CrashLoopBackOff 25 144m cluster-policy-controller-bb5ccd9b4-dtddk 1/1 Running 1 144m cluster-version-operator-5bcfdf9dff-xjw26 1/1 Running 1 144m community-operators-catalog-8fb599ff8-7z2t5 1/1 Running 0 144m control-plane-operator-7d8995bb59-72xtz 1/1 Running 0 144m etcd-operator-77bb448cd6-hhtgn 1/1 Running 0 144m hosted-cluster-config-operator-6bd9f8f6b5-6h2hv 0/1 CrashLoopBackOff 24 144m ignition-server-658d664cd4-tx28h 1/1 Running 0 144m konnectivity-agent-64bf56499d-l9vvk 1/1 Running 0 144m konnectivity-server-846bf4785b-qgdmm 1/1 Running 0 144m kube-apiserver-f57b885b7-b9znp 1/2 CrashLoopBackOff 23 104m kube-controller-manager-84784486fb-gjg6h 0/1 CrashLoopBackOff 22 133m kube-scheduler-7bcbd46b96-hh9wt 1/1 Running 3 144m manifests-bootstrapper 0/1 Completed 3 144m oauth-openshift-569bff6d59-9t44l 1/1 Running 0 142m olm-operator-5c5fd8b476-k6844 1/1 Running 4 144m openshift-apiserver-858db5866d-kvbjr 1/1 Running 0 142m openshift-controller-manager-dfd489d78-bbzj5 1/1 Running 1 144m openshift-oauth-apiserver-cf48fd997-dqtmr 0/1 CrashLoopBackOff 24 144m packageserver-fd6b48fb7-76fls 1/1 Running 3 144m packageserver-fd6b48fb7-p9bhz 1/1 Running 2 144m redhat-marketplace-catalog-6968fc9c6c-tlpcl 1/1 Running 0 144m redhat-operators-catalog-665cccdf4f-d6fgr 1/1 Running 0 144m
It shows there is no etcd pod anymore. And apiserver is crashed too.
Check hostedcontrolplane, etcd status is True (not expected) and apiserver is False (expected)
harry@liuhedeMacBook-Pro openshift % oc describe hostedcontrolplane -n clusters-example Name: example Namespace: clusters-example Labels: cluster.x-k8s.io/cluster-name=example-x674v Annotations: hypershift.openshift.io/cluster: clusters/example API Version: hypershift.openshift.io/v1alpha1 Kind: HostedControlPlane Metadata: Creation Timestamp: 2021-07-19T01:44:41Z Finalizers: hypershift.openshift.io/finalizer Generation: 1 ... Spec: Dns: Base Domain: qe.devcluster.openshift.com Private Zone ID: Z00373243GZL3D8KJVFEE Public Zone ID: Z3B3KOVA3TRCWP Etcd: Management Type: Managed Fips: false Infra ID: example-x674v Issuer URL: https://oidc-example-x674v.apps.heli-0719.qe.devcluster.openshift.com Machine CIDR: 10.0.0.0/16 Network Type: OpenShiftSDN Platform: Aws: Cloud Provider Config: Subnet: Id: subnet-01366704f33296772 Vpc: vpc-05c067a42ae9c71dc Zone: us-east-2a Kube Cloud Controller Creds: Name: provider-creds Node Pool Management Creds: Name: node-provider-creds Region: us-east-2 Roles: Arn: arn:aws:iam::301721915996:role/example-x674v-openshift-ingress Name: cloud-credentials Namespace: openshift-ingress-operator Arn: arn:aws:iam::301721915996:role/example-x674v-openshift-image-registry Name: installer-cloud-credentials Namespace: openshift-image-registry Arn: arn:aws:iam::301721915996:role/example-x674v-aws-ebs-csi-driver-operator Name: ebs-cloud-credentials Namespace: openshift-cluster-csi-drivers Type: AWS Pod CIDR: 10.132.0.0/14 Pull Secret: Name: pull-secret Release Image: quay.io/openshift-release-dev/ocp-release:4.8.0-x86_64 Service CIDR: 172.31.0.0/16 Services: Service: APIServer Service Publishing Strategy: Type: LoadBalancer Service: OAuthServer Service Publishing Strategy: Type: Route Service: OIDC Service Publishing Strategy: Type: Route Service: Konnectivity Service Publishing Strategy: Type: LoadBalancer Signing Key: Name: signing-key Ssh Key: Status: Conditions: Last Transition Time: 2021-07-19T01:44:58Z Message: Configuration passes validation Observed Generation: 1 Reason: HostedClusterAsExpected Status: True Type: ValidConfiguration Last Transition Time: 2021-07-19T01:45:59Z Message: Etcd cluster is running and available Observed Generation: 1 Reason: EtcdRunning Status: True Type: EtcdAvailable Last Transition Time: 2021-07-19T02:31:27Z Message: Observed Generation: 1 Reason: DeploymentStatusUnknown Status: False Type: KubeAPIServerAvailable Last Transition Time: 2021-07-19T02:31:27Z Message: Not all dependent components are available yet Observed Generation: 1 Reason: ComponentsUnavailable Status: False Type: Available Last Transition Time: 2021-07-19T01:45:03Z Message: Observed Generation: 1 Reason: AsExpected Status: True Type: InfrastructureReady Control Plane Endpoint: Host: a4f7b9b9b5c9a4002ba0281167dd0083-110256206.us-east-2.elb.amazonaws.com Port: 6443 External Managed Control Plane: true Initialized: true Kube Config: Key: kubeconfig Name: admin-kubeconfig Last Release Image Transition Time: 2021-07-19T01:44:58Z Ready: false Release Image: quay.io/openshift-release-dev/ocp-release:4.8.0-x86_64 Version: 4.8.0
2 In the above test, there is only one etcd pod in namespace clusters-example. After deleting etcd pod manually, why can't it be recovered by etcd operator automatically ?
Check logs of etcd operator:
time="2021-07-19T04:16:11Z" level=warning msg="all etcd pods are dead." cluster-name=etcd cluster-namespace=clusters-example pkg=cluster time="2021-07-19T04:16:19Z" level=warning msg="all etcd pods are dead." cluster-name=etcd cluster-namespace=clusters-example pkg=cluster time="2021-07-19T04:16:27Z" level=warning msg="all etcd pods are dead." cluster-name=etcd cluster-namespace=clusters-example pkg=cluster time="2021-07-19T04:16:35Z" level=warning msg="all etcd pods are dead." cluster-name=etcd cluster-namespace=clusters-example pkg=cluster
- impacts account
-
HOSTEDCP-112 Add Status Conditions for HostedControlPlane
- Closed