Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-752

Reconcile Error In The NHC + SNR combination

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      I setup a fresh 4.21 cluster then I setup SNR and NHC csv's.
      Now If I stop `stop kubelet` on one of the node ( the controller node of SNR was not on this node ) now node came back to Ready State but after a few sec it goes back to `Not Ready`
      and now in SNR I am seeing below error and the node never came back.

      github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).isNodeRebootCapable
      	/app/self-node-remediation/controllers/selfnoderemediation_controller.go:611
      github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).prepareReboot
      	/app/self-node-remediation/controllers/selfnoderemediation_controller.go:479
      github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).handleFencingStartedPhase
      	/app/self-node-remediation/controllers/selfnoderemediation_controller.go:474
      github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).remediateWithResourceRemoval
      	/app/self-node-remediation/controllers/selfnoderemediation_controller.go:458
      github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).remediateWithOutOfServiceTaint
      	/app/self-node-remediation/controllers/selfnoderemediation_controller.go:415
      github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).ReconcileManager
      	/app/self-node-remediation/controllers/selfnoderemediation_controller.go:306
      github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).Reconcile
      	/app/self-node-remediation/controllers/selfnoderemediation_controller.go:151
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
      	/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
      	/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:340
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
      	/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1
      	/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202
      2026-02-22T19:22:40.632525633Z	ERROR	Reconciler error	{"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "SelfNodeRemediation": {"name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","namespace":"openshift-workload-availability"}, "namespace": "openshift-workload-availability", "name": "ip-10-0-89-20.us-west-1.compute.internal-784gf", "reconcileID": "eaa3d05c-8584-4a0d-bd4b-ec7a8f451461", "error": "Node is not capable to reboot itself", "errorVerbose": "Node is not capable to reboot itself\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).prepareReboot\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:481\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).handleFencingStartedPhase\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:474\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).remediateWithResourceRemoval\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:458\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).remediateWithOutOfServiceTaint\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:415\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).ReconcileManager\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:306\ngithub.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).Reconcile\n\t/app/self-node-remediation/controllers/selfnoderemediation_controller.go:151\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1693"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
      	/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:353
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
      	/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1
      	/app/self-node-remediation/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202
      2026-02-22T19:22:41.273247335Z	INFO	controllers.SelfNodeRemediation	Remediating with OutOfServiceTaint Remediation strategy (auto-selected)	{"pod": "manager", "selfnoderemediation": {"name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","namespace":"openshift-workload-availability"}}
      2026-02-22T19:22:41.273270157Z	INFO	controllers.SelfNodeRemediation	pre-reboot not completed yet, prepare for rebooting	{"pod": "manager", "selfnoderemediation": {"name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","namespace":"openshift-workload-availability"}}
      2026-02-22T19:22:41.273350691Z	DEBUG	events	[remediation] Remediation started by SNR manager	{"type": "Normal", "object": {"kind":"SelfNodeRemediation","namespace":"openshift-workload-availability","name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","uid":"fd90dbe9-01e8-47ee-a81a-130670a9da4b","apiVersion":"self-node-remediation.medik8s.io/v1alpha1","resourceVersion":"40744"}, "reason": "RemediationStarted"}
      2026-02-22T19:22:41.273660069Z	ERROR	controllers.SelfNodeRemediation	failed to get self node remediation agent pod resource	{"pod": "manager", "selfnoderemediation": {"name":"ip-10-0-89-20.us-west-1.compute.internal-784gf","namespace":"openshift-workload-availability"}, "error": "failed to find self node remediation pod matching the given node"}
      github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).isNodeRebootCapable
      	/app/self-node-remediation/controllers/selfnoderemediation_controller.go:611
      github.com/medik8s/self-node-remediation/controllers.(*SelfNodeRemediationReconciler).prepareReboot
      	/app/self-node-remediation/controllers/selfnoderemediation_controller.go:479
      

      Adding the logs for more details :
      snr_nhc_4.21_connected_concil_error.text

              Unassigned Unassigned
              vipikuma vipin kumar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: