-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.10.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When 4.10 talm is used to precache multiple spokes in one batch, if one of the spokes is unavailable, then the precache for the healthy spoke also gets stuck at "PreparingToStart". And openshift-talo-pre-cache namespace is never created on spoke.
# In talm log, note that there are two spokes "worker-0" and "worker-1", but once an error occurred on worker-1 (powered off), it does not continue with worker-0.
2023-01-26T17:02:41.786Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"PrecacheSpecFromPolicies": {"platformImage":"registry.kni-qe-16.lab.eng.rdu2.redhat.com:5000/openshift-release-dev/ocp-release@sha256:bda60323bbca48e3d1dae1154e074cc2cbda64889c7850d78dea28516c5675fa"}}
2023-01-26T17:02:41.786Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"currentState": "NotStarted", "cluster": "worker-1"}
2023-01-26T17:02:41.786Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"currentState": "NotStarted", "condition": "entry", "cluster": "worker-1", "nextState": "PreparingToStart"}
2023-01-26T17:02:41.817Z INFO controllers.ClusterGroupUpgrade [createResourcesFromTemplates] {"cluster": "worker-1", "template": "precache-ns-delete"}
I0126 17:02:42.871121 1 request.go:668] Waited for 1.042567232s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/apps.open-cluster-management.io/v1?timeout=32s
2023-01-26T17:02:46.485Z INFO controllers.ClusterGroupUpgrade [createResourcesFromTemplates] {"cluster": "worker-1", "template": "view-precache-namespace"}
2023-01-26T17:02:51.149Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"previousState": "NotStarted", "nextState": "PreparingToStart", "cluster": "worker-1"}
2023-01-26T17:02:51.149Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"currentState": "NotStarted", "cluster": "worker-0"}
2023-01-26T17:02:51.149Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"currentState": "NotStarted", "condition": "entry", "cluster": "worker-0", "nextState": "PreparingToStart"}
2023-01-26T17:02:51.170Z INFO controllers.ClusterGroupUpgrade [createResourcesFromTemplates] {"cluster": "worker-0", "template": "precache-ns-delete"}
I0126 17:02:52.871687 1 request.go:668] Waited for 1.694950567s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/build.openshift.io/v1?timeout=32s
2023-01-26T17:02:55.834Z INFO controllers.ClusterGroupUpgrade [createResourcesFromTemplates] {"cluster": "worker-0", "template": "view-precache-namespace"}
2023-01-26T17:03:00.496Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"previousState": "NotStarted", "nextState": "PreparingToStart", "cluster": "worker-0"}
2023-01-26T17:03:00.505Z INFO controllers.ClusterGroupUpgrade Finish reconciling CGU {"name": "talm-test/generated-cgu-multi-spokes-one-unavailable", "requeueAfter": 30}
2023-01-26T17:03:30.505Z INFO controllers.ClusterGroupUpgrade Start reconciling CGU {"name": "talm-test/generated-cgu-multi-spokes-one-unavailable"}
2023-01-26T17:03:30.605Z INFO controllers.ClusterGroupUpgrade Loaded CGU {"name": "talm-test/generated-cgu-multi-spokes-one-unavailable", "version": "34775179"}
2023-01-26T17:03:30.606Z INFO controllers.ClusterGroupUpgrade [reconcilePrecaching] {"FindStatusCondition PrecachingDone": "&Condition{Type:PrecachingDone,Status:False,ObservedGeneration:0,LastTransitionTime:2023-01-26 17:02:41 +0000 UTC,Reason:PrecachingNotDone,Message:Precaching is required and not done,}"}
2023-01-26T17:03:30.606Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"currentState": "PreparingToStart", "cluster": "worker-1"}
2023-01-26T17:03:30.609Z ERROR controllers.ClusterGroupUpgrade reconcilePrecaching error {"error": "[getPreparingConditions] no ManagedClusterView conditions found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2023-01-26T17:03:30.609Z INFO controllers.ClusterGroupUpgrade Finish reconciling CGU {"name": "talm-test/generated-cgu-multi-spokes-one-unavailable", "requeueRightAway": false}
2023-01-26T17:03:30.609Z ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "generated-cgu-multi-spokes-one-unavailable", "namespace": "talm-test", "error": "[getPreparingConditions] no ManagedClusterView conditions found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2023-01-26T17:03:30.615Z INFO controllers.ClusterGroupUpgrade Start reconciling CGU {"name": "talm-test/generated-cgu-multi-spokes-one-unavailable"}
2023-01-26T17:03:30.716Z INFO controllers.ClusterGroupUpgrade Loaded CGU {"name": "talm-test/generated-cgu-multi-spokes-one-unavailable", "version": "34775179"}
2023-01-26T17:03:30.716Z INFO controllers.ClusterGroupUpgrade [reconcilePrecaching] {"FindStatusCondition PrecachingDone": "&Condition{Type:PrecachingDone,Status:False,ObservedGeneration:0,LastTransitionTime:2023-01-26 17:02:41 +0000 UTC,Reason:PrecachingNotDone,Message:Precaching is required and not done,}"}
2023-01-26T17:03:30.716Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"currentState": "PreparingToStart", "cluster": "worker-1"}
2023-01-26T17:03:30.720Z ERROR controllers.ClusterGroupUpgrade reconcilePrecaching error {"error": "[getPreparingConditions] no ManagedClusterView conditions found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2023-01-26T17:03:30.720Z INFO controllers.ClusterGroupUpgrade Finish reconciling CGU {"name": "talm-test/generated-cgu-multi-spokes-one-unavailable", "requeueRightAway": false}
2023-01-26T17:03:30.720Z ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "generated-cgu-multi-spokes-one-unavailable", "namespace": "talm-test", "error": "[getPreparingConditions] no ManagedClusterView conditions found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2023-01-26T17:03:30.731Z INFO controllers.ClusterGroupUpgrade Start reconciling CGU {"name": "talm-test/generated-cgu-multi-spokes-one-unavailable"}
2023-01-26T17:03:30.831Z INFO controllers.ClusterGroupUpgrade Loaded CGU {"name": "talm-test/generated-cgu-multi-spokes-one-unavailable", "version": "34775179"}
2023-01-26T17:03:30.831Z INFO controllers.ClusterGroupUpgrade [reconcilePrecaching] {"FindStatusCondition PrecachingDone": "&Condition{Type:PrecachingDone,Status:False,ObservedGeneration:0,LastTransitionTime:2023-01-26 17:02:41 +0000 UTC,Reason:PrecachingNotDone,Message:Precaching is required and not done,}"}
2023-01-26T17:03:30.831Z INFO controllers.ClusterGroupUpgrade [precachingFsm] {"currentState": "PreparingToStart", "cluster": "worker-1"}
2023-01-26T17:03:30.835Z ERROR controllers.ClusterGroupUpgrade reconcilePrecaching error {"error": "[getPreparingConditions] no ManagedClusterView conditions found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
Version-Release number of selected component (if applicable):
4.10.z talm
How reproducible:
100%
Steps to Reproduce:
1. Create CGU and enable precaching against multiple spokes in one batch 2. 3.
Actual results:
precaching stuck at PreparingToStart for both spokes including the healthy one
Expected results:
precaching succeeds on the healthy spoke
Additional info:
- duplicates
-
OCPBUGS-2601 One unreachable SNO prevents CGU from progressing to precaching
-
- Closed
-