Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1253

Upgrade: Reconcile error when the "status" field under the policy status is missing

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:
      This happened during an ocp upgrade. For about 30 seconds, the "status" field under the cluster version policy status is absent. Talo returned reconcile error which caused another reconcile call right away. It kept going like that until ACM populated the status with more data.

      2022-04-22T13:34:46.395-0400 INFO controllers.ClusterGroupUpgrade [getPolicyClusterStatus] Policy has it's compliant status pending

      {"policyName": "common-cluster-version-policy"}

      2022-04-22T13:34:46.395-0400 ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error

      {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "cnfdf18-new", "namespace": "ztp-install", "error": "policy common-cluster-version-policy has it's list of cluster statuses pending"}

      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      /home/jun/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3-0.20210709165254-650ea59f19cc/pkg/internal/controller/controller.go:253
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
      /home/jun/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3-0.20210709165254-650ea59f19cc/pkg/internal/controller/controller.go:214
      2022-04-22T13:34:51.515-0400 INFO controllers.ClusterGroupUpgrade Start reconciling CGU

      {"name": "cnfdf18-new", "version": "81776810"}

      2022-04-22T13:34:51.515-0400 INFO controllers.ClusterGroupUpgrade [getClusterBySelectors]

      {"clustersBySelector": []}

      2022-04-22T13:34:51.515-0400 INFO controllers.ClusterGroupUpgrade [getClustersBySelectors]

      {"clusterNames": ["cnfdf18"]}

      2022-04-22T13:34:51.516-0400 INFO controllers.ClusterGroupUpgrade [Reconcile]

      {"Status.CurrentBatch": 1}

      2022-04-22T13:34:51.516-0400 INFO controllers.ClusterGroupUpgrade [Reconcile] Requeuing after

      {"requeueAfter": "5m0s"}

      2022-04-22T13:34:51.580-0400 INFO controllers.ClusterGroupUpgrade [getPolicyClusterStatus] Policy has it's compliant status pending

      {"policyName": "common-cluster-version-policy"}

      2022-04-22T13:34:51.580-0400 ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error

      {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "cnfdf18-new", "namespace": "ztp-install", "error": "policy common-cluster-version-policy has it's list of cluster statuses pending"}

      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
      /home/jun/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3-0.20210709165254-650ea59f19cc/pkg/internal/controller/controller.go:253
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
      /home/jun/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3-0.20210709165254-650ea59f19cc/pkg/internal/controller/controller.go:214

      For non transient error/situation like this, controller should handle it instead of returning error to k8s. This is similar to bug 2055447. If one cluster gets stuck in this state, it can block everything else.

      Version-Release number of selected component (if applicable):

      How reproducible:
      100%

      Steps to Reproduce:
      1. Initiate an ocp upgrade with a cluster version policy like this through a CGU
      2.
      3.

      Actual results:
      Reconcile error and repeated reconcile calls

      Expected results:
      Should be treated non-compliant and talo should continue to wait for the next scheduled reconcile

      Additional info:

              saskari@redhat.com Saeid Askari
              saskari@redhat.com Saeid Askari
              None
              None
              Yang Liu Yang Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: