Uploaded image for project: 'OpenShift Hosted Control Plane'
  1. OpenShift Hosted Control Plane
  2. HOSTEDCP-204

etcd operator in bad state and not recovering when HA mode set in hostedcluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Minor Minor
    • None
    • None
    • None
    • False
    • False
    • undefined
    • 0
    • 0
    • 0

      1. Create a cluster:
      $ ./hypershift create cluster --pull-secret /home/jiezhao/pull-secret --aws-creds /home/jiezhao/.aws/credentials --name jz-test --base-domain qe.devcluster.openshift.com --region=us-east-2

      2. Check the hostedcluster:
      [jiezhao@cube bin]$ oc get hostedclusters -n clusters
      NAME VERSION KUBECONFIG PROGRESS AVAILABLE REASON
      jz-test 4.8.6 jz-test-admin-kubeconfig Completed True HostedClusterAsExpected
      [jiezhao@cube bin]$
      [jiezhao@cube bin]$
      [jiezhao@cube bin]$ oc get nodepool -n clusters
      NAME CLUSTER NODECOUNT AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG
      jz-test jz-test 2 False False 4.8.6
      [jiezhao@cube bin]$

      3. Check control plane components:

      [jiezhao@cube bin]$ oc get pods -n clusters-jz-test -o wide
      NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
      capa-controller-manager-7888cb46bd-b4kwg 1/1 Running 0 20m 10.131.0.64 ip-10-0-181-55.us-east-2.compute.internal <none> <none>
      catalog-operator-bb59644d5-7rc7j 1/1 Running 5 (18m ago) 19m 10.130.0.51 ip-10-0-137-66.us-east-2.compute.internal <none> <none>
      certified-operators-catalog-6f558cb4f5-bckbw 1/1 Running 0 19m 10.131.0.72 ip-10-0-181-55.us-east-2.compute.internal <none> <none>
      cluster-api-77f68ccb4b-xrhqc 1/1 Running 0 20m 10.131.0.63 ip-10-0-181-55.us-east-2.compute.internal <none> <none>
      cluster-autoscaler-7db48d4d79-fqjzg 1/1 Running 4 (18m ago) 19m 10.128.2.26 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      cluster-policy-controller-d797cccd7-hvzrx 1/1 Running 0 19m 10.129.2.21 ip-10-0-142-209.us-east-2.compute.internal <none> <none>
      cluster-version-operator-657c5f9749-vklzk 1/1 Running 0 19m 10.128.2.28 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      community-operators-catalog-74c794b58f-rkgsb 1/1 Running 0 19m 10.128.2.31 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      control-plane-operator-7d8995bb59-b2pnc 1/1 Running 0 20m 10.131.0.65 ip-10-0-181-55.us-east-2.compute.internal <none> <none>
      etcd-operator-6c744db9-t9grf 1/1 Running 0 19m 10.128.2.25 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      etcd-smrxkp76vm 1/1 Running 0 19m 10.128.2.33 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      hosted-cluster-config-operator-7f997bccdf-6nf6r 1/1 Running 4 (18m ago) 19m 10.128.2.29 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      ignition-server-687f875b87-74k2f 1/1 Running 0 20m 10.128.2.24 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      konnectivity-agent-7fcbdcbcf4-khp6n 1/1 Running 0 19m 10.129.2.14 ip-10-0-142-209.us-east-2.compute.internal <none> <none>
      konnectivity-server-7f7df96fd7-2gcnx 1/1 Running 0 19m 10.129.2.13 ip-10-0-142-209.us-east-2.compute.internal <none> <none>
      kube-apiserver-59665c65-pmp47 2/2 Running 1 (18m ago) 19m 10.129.2.15 ip-10-0-142-209.us-east-2.compute.internal <none> <none>
      kube-controller-manager-5d5c48648f-9977t 1/1 Running 0 11m 10.128.2.35 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      kube-scheduler-85d6598c7c-nr78g 1/1 Running 0 19m 10.129.2.17 ip-10-0-142-209.us-east-2.compute.internal <none> <none>
      manifests-bootstrapper 0/1 Completed 5 19m 10.128.2.30 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      oauth-openshift-867bfdd4d-jwmhz 1/1 Running 0 16m 10.128.2.34 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      olm-operator-676d45946f-cwkj5 1/1 Running 5 (18m ago) 19m 10.129.0.44 ip-10-0-160-25.us-east-2.compute.internal <none> <none>
      openshift-apiserver-56d4c94848-r8rjb 1/1 Running 0 16m 10.131.0.73 ip-10-0-181-55.us-east-2.compute.internal <none> <none>
      openshift-controller-manager-7dc8579986-c9c5d 1/1 Running 0 19m 10.129.2.20 ip-10-0-142-209.us-east-2.compute.internal <none> <none>
      openshift-oauth-apiserver-5dc886fcd6-fpntx 1/1 Running 5 (16m ago) 19m 10.129.2.18 ip-10-0-142-209.us-east-2.compute.internal <none> <none>
      packageserver-9fbc9c678-9mldn 1/1 Running 4 (16m ago) 19m 10.129.0.45 ip-10-0-160-25.us-east-2.compute.internal <none> <none>
      packageserver-9fbc9c678-cnfhm 1/1 Running 4 (17m ago) 19m 10.130.0.50 ip-10-0-137-66.us-east-2.compute.internal <none> <none>
      redhat-marketplace-catalog-77894d6c77-l5t6l 1/1 Running 0 19m 10.128.2.32 ip-10-0-217-105.us-east-2.compute.internal <none> <none>
      redhat-operators-catalog-54b584dbbf-8xxqc 1/1 Running 0 19m 10.131.0.71 ip-10-0-181-55.us-east-2.compute.internal <none> <none>
      [jiezhao@cube bin]$

      4. Set HA mode in the hostedcluster:

      spec:
      autoscaling: {}
      controllerAvailabilityPolicy: HighlyAvailable

      5. Wait  a while, check control plane components:

      [jiezhao@cube bin]$ oc get pods -n clusters-jz-test
      NAME READY STATUS RESTARTS AGE
      capa-controller-manager-7888cb46bd-b4kwg 1/1 Running 0 3h47m
      catalog-operator-bb59644d5-7rc7j 1/1 Running 5 (3h44m ago) 3h46m
      certified-operators-catalog-6f558cb4f5-bckbw 1/1 Running 0 3h46m
      cluster-api-77f68ccb4b-xrhqc 1/1 Running 0 3h47m
      cluster-autoscaler-7db48d4d79-fqjzg 0/1 CrashLoopBackOff 48 (2m28s ago) 3h46m
      cluster-policy-controller-d797cccd7-hvzrx 1/1 Running 1 (3h24m ago) 3h46m
      cluster-policy-controller-d797cccd7-t569g 1/1 Running 0 3h25m
      cluster-policy-controller-d797cccd7-xkc42 1/1 Running 0 3h25m
      cluster-version-operator-657c5f9749-vklzk 1/1 Running 1 (3h23m ago) 3h46m
      community-operators-catalog-6769865679-ks2tn 1/1 Running 0 149m
      control-plane-operator-7d8995bb59-b2pnc 1/1 Running 0 3h47m
      etcd-operator-6c744db9-t9grf 1/1 Running 0 3h46m
      hosted-cluster-config-operator-7f997bccdf-6nf6r 1/1 Running 5 (3h25m ago) 3h46m
      ignition-server-687f875b87-74k2f 1/1 Running 0 3h47m
      konnectivity-agent-7fcbdcbcf4-khp6n 1/1 Running 0 3h46m
      konnectivity-server-7f7df96fd7-2gcnx 1/1 Running 0 3h46m
      kube-apiserver-59665c65-57pln 1/2 CrashLoopBackOff 42 (42s ago) 3h25m
      kube-apiserver-59665c65-jk7pq 1/2 Error 42 (5m31s ago) 3h25m
      kube-apiserver-59665c65-pmp47 1/2 Running 1 (3h45m ago) 3h46m
      kube-controller-manager-5d5c48648f-9977t 0/1 CrashLoopBackOff 44 (71s ago) 3h38m
      kube-controller-manager-5d5c48648f-jxtvj 0/1 CrashLoopBackOff 44 (50s ago) 3h25m
      kube-controller-manager-5d5c48648f-kttqw 0/1 CrashLoopBackOff 44 (88s ago) 3h25m
      kube-scheduler-85d6598c7c-j5g2d 1/1 Running 0 3h25m
      kube-scheduler-85d6598c7c-nr78g 1/1 Running 1 (3h25m ago) 3h46m
      kube-scheduler-85d6598c7c-qd7rd 1/1 Running 0 3h25m
      manifests-bootstrapper 0/1 Completed 5 3h46m
      oauth-openshift-867bfdd4d-5dg2j 0/1 CrashLoopBackOff 44 (2m4s ago) 3h25m
      oauth-openshift-867bfdd4d-jwmhz 1/1 Running 0 3h43m
      oauth-openshift-867bfdd4d-rnbd2 0/1 CrashLoopBackOff 44 (2m31s ago) 3h25m
      olm-operator-676d45946f-cwkj5 1/1 Running 5 (3h44m ago) 3h46m
      openshift-apiserver-56d4c94848-9kg6v 0/1 CrashLoopBackOff 42 (39s ago) 3h25m
      openshift-apiserver-56d4c94848-lchpl 0/1 CrashLoopBackOff 41 (4m53s ago) 3h25m
      openshift-apiserver-56d4c94848-r8rjb 1/1 Running 0 3h43m
      openshift-controller-manager-7dc8579986-c9c5d 1/1 Running 1 (3h24m ago) 3h46m
      openshift-controller-manager-7dc8579986-kfdsf 1/1 Running 0 3h25m
      openshift-controller-manager-7dc8579986-zbj5n 1/1 Running 0 3h25m
      openshift-oauth-apiserver-5dc886fcd6-drhgd 0/1 CrashLoopBackOff 45 (27s ago) 3h25m
      openshift-oauth-apiserver-5dc886fcd6-fpntx 0/1 CrashLoopBackOff 49 (3m26s ago) 3h46m
      openshift-oauth-apiserver-5dc886fcd6-rm2x2 0/1 CrashLoopBackOff 45 (39s ago) 3h25m
      packageserver-9fbc9c678-9mldn 1/1 Running 4 (3h43m ago) 3h46m
      packageserver-9fbc9c678-cnfhm 1/1 Running 4 (3h43m ago) 3h46m
      redhat-marketplace-catalog-77894d6c77-l5t6l 1/1 Running 0 3h46m
      redhat-operators-catalog-54b584dbbf-8xxqc 1/1 Running 0 3h46m
      [jiezhao@cube bin]$

       6. Get etcd-operator logs:

      Please see attached file for etcd-operator logs.etcd-operator-logs.txt

              Unassigned Unassigned
              rhn-support-jiezhao Jie Zhao
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: