Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10806

Cluster upgrade stuck due to incorrect console operator status

    XMLWordPrintable

Details

    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      This is second time seeing the issue. While upgrading from 4.8 to 4.9. The Console cluster operator is showing degraded. The console operator was stuck due to a network hiccup during startup that made it think a cert was expired when there wasn't one.  We restarted the operator and then curl'd out the status of the operator (not clusteroperator) field.  That got the console operator moving back into healthy state.
      
      In order to fix the issue, I had to access the master node and manually clear the console operator condition to null. 
      
      ~~~
      $oc proxy --port=8180
      
      $curl -X PATCH -H 'Content-Type: application/merge-patch+json' --data '{"status": {"conditions": null}}' localhost:8081/apis/operator.openshift.io/v1/consoles/cluster/status
      ~~~
      
      
      
      

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Randomly

      Steps to Reproduce:

      1. It seems happening when there is an issue with console operator during the upgrade.
      2. 
      3.
      

      Actual results:

      $ oc get co console -o yaml
      apiVersion: config.openshift.io/v1
      kind: ClusterOperator
      metadata:
        annotations:
          include.release.openshift.io/ibm-cloud-managed: 'true'
          include.release.openshift.io/self-managed-high-availability: 'true'
          include.release.openshift.io/single-node-developer: 'true'
        creationTimestamp: '2021-10-21T19:21:44Z'
        generation: 1
        name: console
        resourceVersion: '1021118760'
        uid: ee2bdd4f-3792-4497-9930-d87ab87b9076
      spec: {}
      status:
        conditions:
        - lastTransitionTime: '2023-02-28T23:23:55Z'
          message: 'DefaultRouteSyncDegraded: failed to verify custom certificate PEM: custom
            TLS certificate is expired'
      .
      .
      .
      
      Ingress-operator
      ~~~
      ERROR   operator.ingress_controller     controller/controller.go:298  got retryable error; requeueing {"after": "42.122915253s", "error": "IngressController may become degraded soon: CanaryChecksSucceeding=False"}
      ~~~

      Expected results:

      Condition should be reset across restart

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              lmohanty@redhat.com Lalatendu Mohanty
              rhn-support-dchong Daniel Chong
              Jia Liu Jia Liu
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: