Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4572

'console' co is degraded on 4.12.0-rc cluster on Power.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Normal Normal
    • None
    • 4.12
    • Low
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Console CO degraded after 2 days, with this message "ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded"

      How reproducible:

      Reproducible on all clusters with the 4.12.0-rc builds.
      https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/
      
      Note:
      1. This issue is seen on 4.12.0-rc clusters (rc.0, rc.1, and rc.2) with master nodes deployed with 16GB RAM.
      2. 32GB RAM clusters are running fine
      3. Post-deployment, all COs are good, but the console CO gets degraded after 3-4 days
      4. Issue is not seen with ec builds ( we have a good ec.5 cluster which is running for a month)

       

       

      Steps to Reproduce:

      1. Deploy OCP cluster using 4.12.0-rc build on Power Platform (ppc64le)
      2. Check the status of all the nodes, co, and pods.  
      3. Monitor the cluster for a few days.
      

      Actual results:

      "console" co goes to a degraded state. Error message "ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded"

      Expected results:

      The cluster should stay stable.

      Additional info:

      [root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get co
      NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.12.0-rc.2   True        False         False      4d9h
      baremetal                                  4.12.0-rc.2   True        False         False      4d23h
      cloud-controller-manager                   4.12.0-rc.2   True        False         False      4d23h
      cloud-credential                           4.12.0-rc.2   True        False         False      4d23h
      cluster-autoscaler                         4.12.0-rc.2   True        False         False      4d23h
      config-operator                            4.12.0-rc.2   True        False         False      4d23h
      console                                    4.12.0-rc.2   True        False         True       4d8h    ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded
      control-plane-machine-set                  4.12.0-rc.2   True        False         False      4d23h
      csi-snapshot-controller                    4.12.0-rc.2   True        False         False      4d23h
      dns                                        4.12.0-rc.2   True        False         False      4d23h
      etcd                                       4.12.0-rc.2   True        False         False      4d23h
      image-registry                             4.12.0-rc.2   True        False         False      4d22h
      ingress                                    4.12.0-rc.2   True        False         False      4d23h
      insights                                   4.12.0-rc.2   True        False         False      4d23h
      kube-apiserver                             4.12.0-rc.2   True        False         False      4d23h
      kube-controller-manager                    4.12.0-rc.2   True        False         False      4d23h
      kube-scheduler                             4.12.0-rc.2   True        False         False      4d23h
      kube-storage-version-migrator              4.12.0-rc.2   True        False         False      4d23h
      machine-api                                4.12.0-rc.2   True        False         False      4d23h
      machine-approver                           4.12.0-rc.2   True        False         False      4d23h
      machine-config                             4.12.0-rc.2   True        False         False      4d8h
      marketplace                                4.12.0-rc.2   True        False         False      4d23h
      monitoring                                 4.12.0-rc.2   True        False         False      4d8h
      network                                    4.12.0-rc.2   True        False         False      4d23h
      node-tuning                                4.12.0-rc.2   True        False         False      4d23h
      openshift-apiserver                        4.12.0-rc.2   True        False         False      4d10h
      openshift-controller-manager               4.12.0-rc.2   True        False         False      4d23h
      openshift-samples                          4.12.0-rc.2   True        False         False      4d23h
      operator-lifecycle-manager                 4.12.0-rc.2   True        False         False      4d23h
      operator-lifecycle-manager-catalog         4.12.0-rc.2   True        False         False      4d23h
      operator-lifecycle-manager-packageserver   4.12.0-rc.2   True        False         False      4d10h
      service-ca                                 4.12.0-rc.2   True        False         False      4d23h
      storage                                    4.12.0-rc.2   True        False         False      4d23h
      
      
      Describe the co:
      
      [root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc describe co console
      Name:         console
      Namespace:
      Labels:       <none>
      Annotations:  capability.openshift.io/name: Console
                    include.release.openshift.io/ibm-cloud-managed: true
                    include.release.openshift.io/self-managed-high-availability: true
                    include.release.openshift.io/single-node-developer: true
      API Version:  config.openshift.io/v1
      Kind:         ClusterOperator
      Metadata:
        Creation Timestamp:  2022-11-30T07:21:43Z
        Generation:          1
        Managed Fields:
          API Version:  config.openshift.io/v1
          Fields Type:  FieldsV1
          fieldsV1:
            f:metadata:
              f:annotations:
                .:
                f:capability.openshift.io/name:
                f:include.release.openshift.io/ibm-cloud-managed:
                f:include.release.openshift.io/self-managed-high-availability:
                f:include.release.openshift.io/single-node-developer:
              f:ownerReferences:
                .:
                k:{"uid":"e278aa0a-5a0f-41e2-9a72-5c6461685d3a"}:
            f:spec:
          Manager:      cluster-version-operator
          Operation:    Update
          Time:         2022-11-30T07:21:43Z
          API Version:  config.openshift.io/v1
          Fields Type:  FieldsV1
          fieldsV1:
            f:status:
              .:
              f:extension:
          Manager:      cluster-version-operator
          Operation:    Update
          Subresource:  status
          Time:         2022-11-30T07:21:43Z
          API Version:  config.openshift.io/v1
          Fields Type:  FieldsV1
          fieldsV1:
            f:status:
              f:conditions:
              f:relatedObjects:
              f:versions:
          Manager:      console
          Operation:    Update
          Subresource:  status
          Time:         2022-11-30T21:58:45Z
        Owner References:
          API Version:     config.openshift.io/v1
          Kind:            ClusterVersion
          Name:            version
          UID:             e278aa0a-5a0f-41e2-9a72-5c6461685d3a
        Resource Version:  334737
        UID:               5254b11b-2495-4af9-8d29-65b47d4aea5b
      Spec:
      Status:
        Conditions:
          Last Transition Time:  2022-11-30T18:43:49Z
          Message:               ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded
          Reason:                ConsoleNotificationSync_FailedDelete
          Status:                True
          Type:                  Degraded
          Last Transition Time:  2022-11-30T07:56:27Z
          Message:               All is well
          Reason:                AsExpected
          Status:                False
          Type:                  Progressing
          Last Transition Time:  2022-11-30T21:58:34Z
          Message:               All is well
          Reason:                AsExpected
          Status:                True
          Type:                  Available
          Last Transition Time:  2022-11-30T07:42:45Z
          Message:               All is well
          Reason:                AsExpected
          Status:                True
          Type:                  Upgradeable
        Extension:               <nil>
        Related Objects:
          Group:      operator.openshift.io
          Name:       cluster
          Resource:   consoles
          Group:      config.openshift.io
          Name:       cluster
          Resource:   consoles
          Group:      config.openshift.io
          Name:       cluster
          Resource:   infrastructures
          Group:      config.openshift.io
          Name:       cluster
          Resource:   proxies
          Group:      config.openshift.io
          Name:       cluster
          Resource:   oauths
          Group:      oauth.openshift.io
          Name:       console
          Resource:   oauthclients
          Group:
          Name:       openshift-console-operator
          Resource:   namespaces
          Group:
          Name:       openshift-console
          Resource:   namespaces
          Group:
          Name:       console-public
          Namespace:  openshift-config-managed
          Resource:   configmaps
        Versions:
          Name:     operator
          Version:  4.12.0-rc.2
      Events:       <none>
      
      
      The console operator does not seem to come out of degraded state, even though I have deleted the pod and there are no issues currently.
      
      
      [root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get pods -n openshift-console
      NAME                         READY   STATUS    RESTARTS   AGE
      console-5fc9cfd8ff-qqqjm     1/1     Running   0          5d21h
      console-5fc9cfd8ff-vlwvs     1/1     Running   0          6d11h
      downloads-6669fd985f-4fvt4   1/1     Running   0          6d12h
      downloads-6669fd985f-859d2   1/1     Running   0          6d12h
      
      
      [root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get pods -n openshift-console-operator
      NAME                                READY   STATUS    RESTARTS   AGE
      console-operator-55fc8d8ff5-nzrxn   2/2     Running   0          46m
      
      And yet, the status still shows as degraded.
      
      root@rdr-praj-412rc2long1-syd05-bastion-0 ~]# oc get co |grep console
      console                                    4.12.0-rc.2   True        False         True       5d21h   ConsoleNotificationSyncDegraded: Timeout: request did not complete within requested timeout - context deadline exceeded
      
      
      must-gather log and cronjob logs are attached here, cron log captures the metrics from the cluster nodes.
      
      https://drive.google.com/drive/folders/1qUsLxCd2ta-dD7gtevD96s95oABfzCvY?usp=sharing

              jpoulin Jeremy Poulin
              prgawand Prajwal Gawande
              Doug Slavens Doug Slavens
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 hours
                  2h