-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhwa-4.21-0
-
None
-
False
-
-
False
-
-
I setup a fresh 4.21 cluster then I setup SNR and NHC csv's.
Now If I stop `stop kubelet` on one of the node ( the controller node of SNR was not on this node ) now node came back to Ready State but after a few sec it goes back to `Not Ready`
and now in SNR I am seeing below error and the node never came back.
github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).isNodeRebootCapable
/app/self-node-remediation/controllers/selfnoderemediation_controller.go:611
github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).prepareReboot
/app/self-node-remediation/controllers/selfnoderemediation_controller.go:479
github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).handleFencingStartedPhase
/app/self-node-remediation/controllers/selfnoderemediation_controller.go:474
github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).remediateWithResourceRemoval
/app/self-node-remediation/controllers/selfnoderemediation_controller.go:458
github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).remediateWithOutOfServiceTaint
/app/self-node-remediation/controllers/selfnoderemediation_controller.go:415
github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).ReconcileManager
/app/self-node-remediation/controllers/selfnoderemediation_controller.go:306
github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).Reconcile
/app/self-node-remediation/controllers/selfnoderemediation_controller.go:151
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:340
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1
/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202
2026-02-22T19:22:40.632525633Z ERROR Reconciler error {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "SelfNodeRemediation": {"name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","namespace":"openshift-workload-availability"}, "namespace": "openshift-workload-availability", "name": "ip-10-0-89-20.us-west-1.compute.internal-784gf", "reconcileID": "eaa3d05c-8584-4a0d-bd4b-ec7a8f451461", "error": "Node is not capable to reboot itself", "errorVerbose": "Node is not capable to reboot itself\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).prepareReboot\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:481\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).handleFencingStartedPhase\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:474\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).remediateWithResourceRemoval\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:458\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).remediateWithOutOfServiceTaint\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:415\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).ReconcileManager\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:306\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).Reconcile\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:151\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1693"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:353
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1
/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202
2026-02-22T19:22:41.273247335Z INFO controllers.SelfNodeRemediation Remediating with OutOfServiceTaint Remediation strategy (auto-selected) {"pod": "manager", "selfnoderemediation": {"name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","namespace":"openshift-workload-availability"}}
2026-02-22T19:22:41.273270157Z INFO controllers.SelfNodeRemediation pre-reboot not completed yet, prepare for rebooting {"pod": "manager", "selfnoderemediation": {"name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","namespace":"openshift-workload-availability"}}
2026-02-22T19:22:41.273350691Z DEBUG events [remediation] Remediation started by SNR manager {"type": "Normal", "object": {"kind":"SelfNodeRemediation","namespace":"openshift-workload-availability","name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","uid":"fd90dbe9-01e8-47ee-a81a-130670a9da4b","apiVersion":"self-node-remediation.medik8s.io/v1alpha1","resourceVersion":"40744"}, "reason": "RemediationStarted"}
2026-02-22T19:22:41.273660069Z ERROR controllers.SelfNodeRemediation failed to get self node remediation agent pod resource {"pod": "manager", "selfnoderemediation": {"name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","namespace":"openshift-workload-availability"}, "error": "failed to find self node remediation pod matching the given node"}
github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).isNodeRebootCapable
/app/self-node-remediation/controllers/selfnoderemediation_controller.go:611
github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).prepareReboot
/app/self-node-remediation/controllers/selfnoderemediation_controller.go:479
Adding the logs for more details :
snr_nhc_4.21_connected_concil_error.text