Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25976

console operator takes too long to clean up failed status

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.15
    • Management Console
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • No
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The issue is firstly reported in https://issues.redhat.com/browse/HIVE-2390, later we see similar issue in QE CI installation
      
      The problem is: console pod is running well, console is behaving correctly(user is able to visit console in browser and login, pages are loading well) however console operator keeps Degraded: True for quite long time, although finally console operator will report correct status it still block installation from succeed, and seems related with ingress restarts

      Version-Release number of selected component (if applicable):

       4.15

      How reproducible:

      frequently in Hive CI and QE CI

      Steps to Reproduce:

      Search 'clusterOperator=console condition=Degraded' in following two CI logs
      
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hive-master-periodic-e2e-pool-weekly/1739497565449097216/artifacts/e2e-pool-weekly/test/artifacts/hive-controllers.log 
      
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hive-master-periodic-e2e-pool-weekly/1736960825022746624/artifacts/e2e-pool-weekly/test/artifacts/hive-controllers.log

      Actual results:

      firstly, console operator is reporting Degraded: True due to route not admitted, however even after route can be successfully accessed without any errors, console operator still reports Degraded: True
      
      console operators takes about 3 hours to resume(report correct status)
      
      I1227 07:03:11.626408       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"afb0554b-b7fb-4cd7-9a3e-402ffd1c6b3e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Degraded changed from False to True ("RouteHealthDegraded: console route is not admitted")
      
      
      I1227 09:56:03.027887       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-console-operator", Name:"console-operator", UID:"afb0554b-b7fb-4cd7-9a3e-402ffd1c6b3e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/console changed: Degraded changed from True to False ("All is well"),Available changed from False to True ("All is well")

       

      Expected results:

      console operator should resume/come up in acceptable time    

      Additional info:

      In another case, co/console is reporting 
      $ oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.15.0-0.nightly-2023-12-25-100326   True        False         False      102m    
      baremetal                                  4.15.0-0.nightly-2023-12-25-100326   True        False         False      112m    
      cloud-controller-manager                   4.15.0-0.nightly-2023-12-25-100326   True        False         False      112m    
      cloud-credential                           4.15.0-0.nightly-2023-12-25-100326   True        False         False      108m    
      cluster-autoscaler                         4.15.0-0.nightly-2023-12-25-100326   True        False         False      108m    
      config-operator                            4.15.0-0.nightly-2023-12-25-100326   True        False         False      115m    
      console                                    4.15.0-0.nightly-2023-12-25-100326   False       True          True       105m    DeploymentAvailable: 0 replicas available for console deployment...
      control-plane-machine-set                  4.15.0-0.nightly-2023-12-25-100326   True        False         False      114m    
      csi-snapshot-controller                    4.15.0-0.nightly-2023-12-25-100326   True        False         False      115m    
      dns                                        4.15.0-0.nightly-2023-12-25-100326   True        False         False      114m    
      etcd                                       4.15.0-0.nightly-2023-12-25-100326   True        False         False      111m    
      image-registry                             4.15.0-0.nightly-2023-12-25-100326   True        False         False      102m    
      ingress                                    4.15.0-0.nightly-2023-12-25-100326   True        False         False      114m    
      
      $ oc describe co console
      .....
      Status:
        Conditions:
          Last Transition Time:  2024-01-03T07:21:09Z
          Message:               RouteHealthDegraded: route not yet available, https://console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com returns '503 Service Unavailable'
          Reason:                RouteHealth_StatusError
          Status:                True
          Type:                  Degraded
          Last Transition Time:  2024-01-03T07:16:54Z
          Message:               SyncLoopRefreshProgressing: Working toward version 4.15.0-0.nightly-2023-12-25-100326, 0 replicas available
          Reason:                SyncLoopRefresh_InProgress
          Status:                True
          Type:                  Progressing
          Last Transition Time:  2024-01-03T07:16:54Z
          Message:               DeploymentAvailable: 0 replicas available for console deployment
      RouteHealthAvailable: route not yet available, https://console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com returns '503 Service Unavailable'
          Reason:                Deployment_InsufficientReplicas::RouteHealth_StatusError
          Status:                False
          Type:                  Available
          Last Transition Time:  2024-01-03T07:16:58Z
          Message:               All is well
          Reason:                AsExpected
          Status:                True
          Type:                  Upgradeable
        Extension:               <nil>
      
      $ $ oc get all -n openshift-console
      Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
      NAME                             READY   STATUS    RESTARTS   AGE
      pod/console-7fcf8c4bdc-qrxr2     1/1     Running   0          102m
      pod/downloads-7984f9cc88-rq4r6   1/1     Running   0          107mNAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
      service/console     ClusterIP   172.30.68.4      <none>        443/TCP   107m
      service/downloads   ClusterIP   172.30.163.136   <none>        80/TCP    107mNAME                        READY   UP-TO-DATE   AVAILABLE   AGE
      deployment.apps/console     1/1     1            1           106m
      deployment.apps/downloads   1/1     1            1           107mNAME                                   DESIRED   CURRENT   READY   AGE
      replicaset.apps/console-668fd67cc5     0         0         0       106m
      replicaset.apps/console-767d8974fc     0         0         0       103m
      replicaset.apps/console-7fcf8c4bdc     1         1         1       102m
      replicaset.apps/downloads-7984f9cc88   1         1         1       107mNAME                                 HOST/PORT                                                                      PATH   SERVICES    PORT    TERMINATION          WILDCARD
      route.route.openshift.io/console     console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com            console     https   reencrypt/Redirect   None
      route.route.openshift.io/downloads   downloads-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com          downloads   http    edge/Redirect        None 
      $ oc -n openshift-ingress rsh router-default-cf4dcdf75-jpt7x 
      sh-4.4$ curl -kI https://console-openshift-console.apps.jima-16854-6-74.qe.devcluster.openshift.com
      HTTP/1.1 200 OK
      referrer-policy: strict-origin-when-cross-origin
      set-cookie: csrf-token=kA+FhBTs3kQd20L4v2bPO14+SEzfZwib8G+k32nmxnKrdtujqjy+xOBMjlTah5xfWKarHiMNZYFk6Od90MQV8A==; Path=/; Secure; SameSite=Lax
      x-content-type-options: nosniff
      x-dns-prefetch-control: off
      x-frame-options: DENY
      x-xss-protection: 1; mode=block
      date: Wed, 03 Jan 2024 09:05:24 GMT
      content-type: text/html; charset=utf-8
      set-cookie: 1e2670d92730b515ce3a1bb65da45062=e48473f300ccce6c56c4108e066d07a1; path=/; HttpOnly; Secure; SameSite=None
      
      $ oc get pods -n openshift-ingress                                        
      NAME                             READY   STATUS    RESTARTS      AGE
      router-default-cf4dcdf75-jpt7x   1/1     Running   2 (97m ago)   103m
      $ oc get pods -n openshift-console-operator                  
      NAME                                READY   STATUS    RESTARTS       AGE
      console-operator-67c9d788b6-lk58j   2/2     Running   6 (123m ago)   123m

              jhadvig@redhat.com Jakub Hadvig
              rhn-support-yapei YaDan Pei
              None
              None
              YaDan Pei YaDan Pei
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: