-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.10
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
This happened during an ocp upgrade. For about 30 seconds, the "status" field under the cluster version policy status is absent. Talo returned reconcile error which caused another reconcile call right away. It kept going like that until ACM populated the status with more data.
2022-04-22T13:34:46.395-0400 INFO controllers.ClusterGroupUpgrade [getPolicyClusterStatus] Policy has it's compliant status pending
{"policyName": "common-cluster-version-policy"}2022-04-22T13:34:46.395-0400 ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error
{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "cnfdf18-new", "namespace": "ztp-install", "error": "policy common-cluster-version-policy has it's list of cluster statuses pending"}sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/home/jun/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3-0.20210709165254-650ea59f19cc/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/home/jun/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3-0.20210709165254-650ea59f19cc/pkg/internal/controller/controller.go:214
2022-04-22T13:34:51.515-0400 INFO controllers.ClusterGroupUpgrade Start reconciling CGU
2022-04-22T13:34:51.515-0400 INFO controllers.ClusterGroupUpgrade [getClusterBySelectors]
{"clustersBySelector": []}2022-04-22T13:34:51.515-0400 INFO controllers.ClusterGroupUpgrade [getClustersBySelectors]
{"clusterNames": ["cnfdf18"]}2022-04-22T13:34:51.516-0400 INFO controllers.ClusterGroupUpgrade [Reconcile]
{"Status.CurrentBatch": 1}2022-04-22T13:34:51.516-0400 INFO controllers.ClusterGroupUpgrade [Reconcile] Requeuing after
{"requeueAfter": "5m0s"}2022-04-22T13:34:51.580-0400 INFO controllers.ClusterGroupUpgrade [getPolicyClusterStatus] Policy has it's compliant status pending
{"policyName": "common-cluster-version-policy"}2022-04-22T13:34:51.580-0400 ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error
{"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "cnfdf18-new", "namespace": "ztp-install", "error": "policy common-cluster-version-policy has it's list of cluster statuses pending"}sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/home/jun/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3-0.20210709165254-650ea59f19cc/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/home/jun/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.3-0.20210709165254-650ea59f19cc/pkg/internal/controller/controller.go:214
For non transient error/situation like this, controller should handle it instead of returning error to k8s. This is similar to bug 2055447. If one cluster gets stuck in this state, it can block everything else.
Version-Release number of selected component (if applicable):
How reproducible:
100%
Steps to Reproduce:
1. Initiate an ocp upgrade with a cluster version policy like this through a CGU
2.
3.
Actual results:
Reconcile error and repeated reconcile calls
Expected results:
Should be treated non-compliant and talo should continue to wait for the next scheduled reconcile
Additional info:
- is cloned by
-
OCPBUGS-1254 Upgrade: Reconcile error when the "status" field under the policy status is missing
-
- Closed
-
- is depended on by
-
OCPBUGS-1254 Upgrade: Reconcile error when the "status" field under the policy status is missing
-
- Closed
-