• No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the console Operator was using client instead of listeners for fetching a cluster resource causing the Operator to do operations on resources with old revision. With this update, the console Operator uses list to fetch data from cluster instead of clients. (link:https://issues.redhat.com/browse/OCPBUGS-25484[*OCPBUGS-25484*])




      Show
      * Previously, the console Operator was using client instead of listeners for fetching a cluster resource causing the Operator to do operations on resources with old revision. With this update, the console Operator uses list to fetch data from cluster instead of clients. (link: https://issues.redhat.com/browse/OCPBUGS-25484 [* OCPBUGS-25484 *])
    • Bug Fix
    • Done

      Description of problem:

      Reviewing 4.15 Install failures (install should succeed: overall) there are a number of variants impacted by recent install failures.

      search.ci: Cluster operator console is not available

      Jobs like periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-sdn-serial show failures that appear to start with 4.15.0-0.nightly-2023-12-07-225558 have installation failures due to console-operator

      ConsoleOperator reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
      

       

       

      4.15.0-0.nightly-2023-12-07-225558 contains console-operator/pull/814, noting in case it is related

       

       

      Version-Release number of selected component (if applicable):

       4.15   

      How reproducible:

          

      Steps to Reproduce:

          1. Review link to install failures above
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:
      periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-sdn
      periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade
      periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade

            [OCPBUGS-25484] Install failure for console operator

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:0041

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:0041

            Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which had Priority = "Blocker" and information already set in the Release Blocker field) is being updated to Priority = Critical. The Release Blocker field was not changed.

            OpenShift Jira Automation Bot added a comment - Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which had Priority = "Blocker" and information already set in the Release Blocker field) is being updated to Priority = Critical. The Release Blocker field was not changed.

            YaDan Pei added a comment -

            Search Cluster operator console is not available in 2 days there are 4 failures, these failures occur during 4.15 installation in an upgrade job

            #1750267748480454656 

            #1750227396386099200

            #1750227382788165632

            #1750044858132729856 

            The issue seems fixed in 4.16, moving to VERIFIED

            YaDan Pei added a comment - Search Cluster operator console is not available in 2 days there are 4 failures, these failures occur during 4.15 installation in an upgrade job #1750267748480454656   #1750227396386099200 #1750227382788165632 #1750044858132729856   The issue seems fixed in 4.16, moving to VERIFIED

            Looks like this commit from slaznick@redhat.com PR should mitigate the issue.

            Jakub Hadvig added a comment - Looks like this commit from slaznick@redhat.com PR should mitigate the issue.

            YaDan Pei added a comment -

            Search Cluster operator console is not available in 4.16 jobs for latest 2 days we can still see many occurrences of console installation failure 

            for example, periodic-ci-openshift-release-master-nightly-4.16-e2e-gcp-sdn-serial still reports the same error

             

            E0121 15:42:35.917065       1 base_controller.go:268] ConsoleOperator reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again 

            other job failures

            periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-gcp-ovn-heterogeneous/

            periodic-ci-openshift-release-master-nightly-4.16-e2e-gcp-sdn

             

            YaDan Pei added a comment - Search Cluster operator console is not available in 4.16 jobs for latest 2 days we can still see many occurrences of console installation failure  for example, periodic-ci-openshift-release-master-nightly-4.16-e2e-gcp-sdn-serial still reports the same error   E0121 15:42:35.917065 1 base_controller.go:268] ConsoleOperator reconciliation failed: Operation cannot be fulfilled on consoles. operator .openshift.io "cluster" : the object has been modified; please apply your changes to the latest version and try again other job failures periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-gcp-ovn-heterogeneous/ periodic-ci-openshift-release-master-nightly-4.16-e2e-gcp-sdn  

            jhadvig@redhat.com 

            From the last couple of comments looks like the bug is not fixed yet. I see this issue on an sno 4.15 install. I have the cluster if you need to investigate further.

            sgoveas Fri Jan 19 11:05:11 2024 workspace
            $ oc get co console
            NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
            console   4.15.0-0.nightly-2024-01-18-050837   True        False         True       52m     ConsoleCustomRouteSyncDegraded: the server is currently unable to handle the request (delete routes.route.openshift.io console-custom)...
            
            sgoveas Fri Jan 19 11:11:45 2024 workspace
            $ oc logs -n openshift-console-operator console-operator-7d85987b49-hxl8h -c console-operator > console-operator2.log
            
            
            $ tail -20 console-operator2.log
            I0119 11:12:10.201173       1 request.go:697] Waited for 1.196904506s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            I0119 11:12:11.400279       1 request.go:697] Waited for 1.191217648s due to client-side throttling, not priority and fairness, request: PUT:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster/status
            I0119 11:12:12.400453       1 request.go:697] Waited for 1.189716765s due to client-side throttling, not priority and fairness, request: PUT:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster/status
            E0119 11:12:13.004186       1 base_controller.go:268] DownloadsRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
            I0119 11:12:13.600931       1 request.go:697] Waited for 1.098696168s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            I0119 11:12:14.800373       1 request.go:697] Waited for 1.293708095s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            E0119 11:12:15.004138       1 base_controller.go:268] ConsoleRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
            I0119 11:12:15.800375       1 request.go:697] Waited for 1.298257629s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            I0119 11:12:16.801181       1 request.go:697] Waited for 1.192267616s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            I0119 11:12:18.001002       1 request.go:697] Waited for 1.191783732s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            E0119 11:12:19.003586       1 base_controller.go:268] DownloadsRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
            I0119 11:12:19.200238       1 request.go:697] Waited for 1.19105371s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            I0119 11:12:20.400749       1 request.go:697] Waited for 1.098833649s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            E0119 11:12:21.004232       1 base_controller.go:268] ConsoleRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
            I0119 11:12:21.601154       1 request.go:697] Waited for 1.098261169s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            I0119 11:12:22.800542       1 request.go:697] Waited for 1.093415649s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            I0119 11:12:23.800966       1 request.go:697] Waited for 1.187469962s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            E0119 11:12:24.804030       1 base_controller.go:268] DownloadsRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again
            I0119 11:12:25.000843       1 request.go:697] Waited for 1.192121334s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster
            I0119 11:12:26.200940       1 request.go:697] Waited for 1.19213761s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster

            Steeve Goveas added a comment - jhadvig@redhat.com   From the last couple of comments looks like the bug is not fixed yet. I see this issue on an sno 4.15 install. I have the cluster if you need to investigate further. sgoveas Fri Jan 19 11:05:11 2024 workspace $ oc get co console NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE console   4.15.0-0.nightly-2024-01-18-050837   True        False         True       52m     ConsoleCustomRouteSyncDegraded: the server is currently unable to handle the request (delete routes.route.openshift.io console-custom)... sgoveas Fri Jan 19 11:11:45 2024 workspace $ oc logs -n openshift-console-operator console-operator-7d85987b49-hxl8h -c console-operator > console-operator2.log $ tail -20 console-operator2.log I0119 11:12:10.201173       1 request.go:697] Waited for 1.196904506s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster I0119 11:12:11.400279       1 request.go:697] Waited for 1.191217648s due to client-side throttling, not priority and fairness, request: PUT:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster/status I0119 11:12:12.400453       1 request.go:697] Waited for 1.189716765s due to client-side throttling, not priority and fairness, request: PUT:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster/status E0119 11:12:13.004186       1 base_controller.go:268] DownloadsRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again I0119 11:12:13.600931       1 request.go:697] Waited for 1.098696168s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster I0119 11:12:14.800373       1 request.go:697] Waited for 1.293708095s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster E0119 11:12:15.004138       1 base_controller.go:268] ConsoleRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again I0119 11:12:15.800375       1 request.go:697] Waited for 1.298257629s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster I0119 11:12:16.801181       1 request.go:697] Waited for 1.192267616s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster I0119 11:12:18.001002       1 request.go:697] Waited for 1.191783732s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster E0119 11:12:19.003586       1 base_controller.go:268] DownloadsRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again I0119 11:12:19.200238       1 request.go:697] Waited for 1.19105371s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster I0119 11:12:20.400749       1 request.go:697] Waited for 1.098833649s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster E0119 11:12:21.004232       1 base_controller.go:268] ConsoleRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again I0119 11:12:21.601154       1 request.go:697] Waited for 1.098261169s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster I0119 11:12:22.800542       1 request.go:697] Waited for 1.093415649s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster I0119 11:12:23.800966       1 request.go:697] Waited for 1.187469962s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster E0119 11:12:24.804030       1 base_controller.go:268] DownloadsRouteController reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again I0119 11:12:25.000843       1 request.go:697] Waited for 1.192121334s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster I0119 11:12:26.200940       1 request.go:697] Waited for 1.19213761s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1/consoles/cluster

            Dan Winship added a comment -

            (FTR this happens a lot in the middle of the sdn-ovn migration jobs, after the part where all the apiservers get rebooted, if that helps understand how the cache is getting persistently out of sync...)

            Dan Winship added a comment - (FTR this happens a lot in the middle of the sdn-ovn migration jobs, after the part where all the apiservers get rebooted, if that helps understand how the cache is getting persistently out of sync...)

            Dan Winship added a comment - - edited

            Yes, this is still broken; the library-go change to re-fetch the object from the apiserver has no effect, because UpdateOperatorStatus then just re-fetches the object from the cache:

            func (c *OperatorClient) UpdateOperatorStatus(ctx context.Context, resourceVersion string, status *operatorv1.OperatorStatus) (*operatorv1.OperatorStatus, error) {
                original, err := c.Informers.Operator().V1().Consoles().Lister().Get(api.ConfigResourceName)
                if err != nil {
                    return nil, err
                }
                copy := original.DeepCopy()
                copy.ResourceVersion = resourceVersion
                copy.Status.OperatorStatus = *status
            
                ret, err := c.Client.Consoles().UpdateStatus(c.Context, copy, metav1.UpdateOptions{})
                if err != nil {
                    return nil, err
                }
            
                return &ret.Status.OperatorStatus, nil
            } 

            (This isn't just a bug in console-operator; all operators implement this function this way.)

             

            Dan Winship added a comment - - edited Yes, this is still broken; the library-go change to re-fetch the object from the apiserver has no effect, because UpdateOperatorStatus then just re-fetches the object from the cache: func (c *OperatorClient) UpdateOperatorStatus(ctx context.Context, resourceVersion string , status *operatorv1.OperatorStatus) (*operatorv1.OperatorStatus, error) {     original, err := c.Informers.Operator().V1().Consoles().Lister().Get(api.ConfigResourceName)     if err != nil {         return nil, err     }     copy := original.DeepCopy()     copy.ResourceVersion = resourceVersion     copy.Status.OperatorStatus = *status     ret, err := c.Client.Consoles().UpdateStatus(c.Context, copy, metav1.UpdateOptions{})     if err != nil {         return nil, err     }   return &ret.Status.OperatorStatus, nil } (This isn't just a bug in console-operator; all operators implement this function this way.)  

            Forrest Babcock added a comment - 4.16.0-0.ci-2024-01-17-120614  looks to have console-operator/pull/836  and 4.16-e2e-gcp-sdn is showing ConsoleOperator reconciliation failed: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again Passing along as an fyi

            Hi jhadvig@redhat.com,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi jhadvig@redhat.com , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

              jhadvig@redhat.com Jakub Hadvig
              rh-ee-fbabcock Forrest Babcock
              YaDan Pei YaDan Pei
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated:
                Resolved: