Loading...

Type: Bug
Resolution: Obsolete
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- QE

Blocked:
False
Ready:
False
Release Note Text:
Undefined
Market:

Cost of Delay:
0
WSJF:
0
Risk Score:
0

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

This bug is found during the test for ~~HOSTEDCP-112~~

The testcase record: OCP-42855

0) use `hypershift create cluster` to create a hosted cluster. Check etcd status in the control plane of hosted cluster. Namespace is : clusters-{cluster-name}

1) when etcd pod is deleted, the condition status in hostedcontrolplane shows etcd status is still True. The expected value is False.

2) Confirmed with hypershift dev, the failure is caused by incorrect status in etcdcluster resource. The right place to fix this is in the etcd operator.

harry@liuhedeMacBook-Pro openshift % oc get pods -n clusters-example                               
NAME                                              READY   STATUS             RESTARTS   AGE
capa-controller-manager-7888cb46bd-s7lsd          1/1     Running            0          144m
certified-operators-catalog-6d8b854bc4-lsslh      1/1     Running            0          141m
cluster-api-5b84c5f55f-pntv2                      1/1     Running            0          144m
cluster-autoscaler-dc987fbf9-vcptb                0/1     CrashLoopBackOff   25         144m
cluster-policy-controller-bb5ccd9b4-dtddk         1/1     Running            1          144m
cluster-version-operator-5bcfdf9dff-xjw26         1/1     Running            1          144m
community-operators-catalog-8fb599ff8-7z2t5       1/1     Running            0          144m
control-plane-operator-7d8995bb59-72xtz           1/1     Running            0          144m
etcd-operator-77bb448cd6-hhtgn                    1/1     Running            0          144m
hosted-cluster-config-operator-6bd9f8f6b5-6h2hv   0/1     CrashLoopBackOff   24         144m
ignition-server-658d664cd4-tx28h                  1/1     Running            0          144m
konnectivity-agent-64bf56499d-l9vvk               1/1     Running            0          144m
konnectivity-server-846bf4785b-qgdmm              1/1     Running            0          144m
kube-apiserver-f57b885b7-b9znp                    1/2     CrashLoopBackOff   23         104m
kube-controller-manager-84784486fb-gjg6h          0/1     CrashLoopBackOff   22         133m
kube-scheduler-7bcbd46b96-hh9wt                   1/1     Running            3          144m
manifests-bootstrapper                            0/1     Completed          3          144m
oauth-openshift-569bff6d59-9t44l                  1/1     Running            0          142m
olm-operator-5c5fd8b476-k6844                     1/1     Running            4          144m
openshift-apiserver-858db5866d-kvbjr              1/1     Running            0          142m
openshift-controller-manager-dfd489d78-bbzj5      1/1     Running            1          144m
openshift-oauth-apiserver-cf48fd997-dqtmr         0/1     CrashLoopBackOff   24         144m
packageserver-fd6b48fb7-76fls                     1/1     Running            3          144m
packageserver-fd6b48fb7-p9bhz                     1/1     Running            2          144m
redhat-marketplace-catalog-6968fc9c6c-tlpcl       1/1     Running            0          144m
redhat-operators-catalog-665cccdf4f-d6fgr         1/1     Running            0          144m

It shows there is no etcd pod anymore. And apiserver is crashed too.

Check hostedcontrolplane, etcd status is True (not expected) and apiserver is False (expected)

harry@liuhedeMacBook-Pro openshift % oc describe hostedcontrolplane -n clusters-example            
Name:         example
Namespace:    clusters-example
Labels:       cluster.x-k8s.io/cluster-name=example-x674v
Annotations:  hypershift.openshift.io/cluster: clusters/example
API Version:  hypershift.openshift.io/v1alpha1
Kind:         HostedControlPlane
Metadata:
  Creation Timestamp:  2021-07-19T01:44:41Z
  Finalizers:
    hypershift.openshift.io/finalizer
  Generation:  1

...

Spec:
  Dns:
    Base Domain:      qe.devcluster.openshift.com
    Private Zone ID:  Z00373243GZL3D8KJVFEE
    Public Zone ID:   Z3B3KOVA3TRCWP
  Etcd:
    Management Type:  Managed
  Fips:               false
  Infra ID:           example-x674v
  Issuer URL:         https://oidc-example-x674v.apps.heli-0719.qe.devcluster.openshift.com
  Machine CIDR:       10.0.0.0/16
  Network Type:       OpenShiftSDN
  Platform:
    Aws:
      Cloud Provider Config:
        Subnet:
          Id:  subnet-01366704f33296772
        Vpc:   vpc-05c067a42ae9c71dc
        Zone:  us-east-2a
      Kube Cloud Controller Creds:
        Name:  provider-creds
      Node Pool Management Creds:
        Name:  node-provider-creds
      Region:  us-east-2
      Roles:
        Arn:        arn:aws:iam::301721915996:role/example-x674v-openshift-ingress
        Name:       cloud-credentials
        Namespace:  openshift-ingress-operator
        Arn:        arn:aws:iam::301721915996:role/example-x674v-openshift-image-registry
        Name:       installer-cloud-credentials
        Namespace:  openshift-image-registry
        Arn:        arn:aws:iam::301721915996:role/example-x674v-aws-ebs-csi-driver-operator
        Name:       ebs-cloud-credentials
        Namespace:  openshift-cluster-csi-drivers
    Type:           AWS
  Pod CIDR:         10.132.0.0/14
  Pull Secret:
    Name:         pull-secret
  Release Image:  quay.io/openshift-release-dev/ocp-release:4.8.0-x86_64
  Service CIDR:   172.31.0.0/16
  Services:
    Service:  APIServer
    Service Publishing Strategy:
      Type:   LoadBalancer
    Service:  OAuthServer
    Service Publishing Strategy:
      Type:   Route
    Service:  OIDC
    Service Publishing Strategy:
      Type:   Route
    Service:  Konnectivity
    Service Publishing Strategy:
      Type:  LoadBalancer
  Signing Key:
    Name:  signing-key
  Ssh Key:
Status:
  Conditions:
    Last Transition Time:  2021-07-19T01:44:58Z
    Message:               Configuration passes validation
    Observed Generation:   1
    Reason:                HostedClusterAsExpected
    Status:                True
    Type:                  ValidConfiguration
    Last Transition Time:  2021-07-19T01:45:59Z
    Message:               Etcd cluster is running and available
    Observed Generation:   1
    Reason:                EtcdRunning
    Status:                True
    Type:                  EtcdAvailable
    Last Transition Time:  2021-07-19T02:31:27Z
    Message:               
    Observed Generation:   1
    Reason:                DeploymentStatusUnknown
    Status:                False
    Type:                  KubeAPIServerAvailable
    Last Transition Time:  2021-07-19T02:31:27Z
    Message:               Not all dependent components are available yet
    Observed Generation:   1
    Reason:                ComponentsUnavailable
    Status:                False
    Type:                  Available
    Last Transition Time:  2021-07-19T01:45:03Z
    Message:               
    Observed Generation:   1
    Reason:                AsExpected
    Status:                True
    Type:                  InfrastructureReady
  Control Plane Endpoint:
    Host:                          a4f7b9b9b5c9a4002ba0281167dd0083-110256206.us-east-2.elb.amazonaws.com
    Port:                          6443
  External Managed Control Plane:  true
  Initialized:                     true
  Kube Config:
    Key:                               kubeconfig
    Name:                              admin-kubeconfig
  Last Release Image Transition Time:  2021-07-19T01:44:58Z
  Ready:                               false
  Release Image:                       quay.io/openshift-release-dev/ocp-release:4.8.0-x86_64
  Version:                             4.8.0

2 In the above test, there is only one etcd pod in namespace clusters-example. After deleting etcd pod manually, why can't it be recovered by etcd operator automatically ?

Check logs of etcd operator:

time="2021-07-19T04:16:11Z" level=warning msg="all etcd pods are dead." cluster-name=etcd cluster-namespace=clusters-example pkg=cluster
time="2021-07-19T04:16:19Z" level=warning msg="all etcd pods are dead." cluster-name=etcd cluster-namespace=clusters-example pkg=cluster
time="2021-07-19T04:16:27Z" level=warning msg="all etcd pods are dead." cluster-name=etcd cluster-namespace=clusters-example pkg=cluster
time="2021-07-19T04:16:35Z" level=warning msg="all etcd pods are dead." cluster-name=etcd cluster-namespace=clusters-example pkg=cluster

impacts account

HOSTEDCP-112 Add Status Conditions for HostedControlPlane

Closed

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates