2025-05-20T21:19:48.861870977+00:00 stderr F 2025-05-20T21:19:48.861661003Z INFO setup Go Version: go1.23.2 (Red Hat 1.23.2-1.el9) X:strictfipsruntime 2025-05-20T21:19:48.861870977+00:00 stderr F 2025-05-20T21:19:48.861789564Z INFO setup Go OS/Arch: linux/amd64 2025-05-20T21:19:48.861870977+00:00 stderr F 2025-05-20T21:19:48.861793211Z INFO setup Operator Version: v0.10.0 2025-05-20T21:19:48.861870977+00:00 stderr F 2025-05-20T21:19:48.861796006Z INFO setup Git Commit: 2025-05-20T21:19:48.861870977+00:00 stderr F 2025-05-20T21:19:48.861798721Z INFO setup Build Date: 2025-01-13T11:55:12+00:00 2025-05-20T21:19:48.861870977+00:00 stderr F 2025-05-20T21:19:48.861801196Z INFO setup HTTP/2 for metrics and webhook server disabled 2025-05-20T21:19:48.862032992+00:00 stderr F 2025-05-20T21:19:48.86201643Z INFO setup OLM injected certs for webhooks not found 2025-05-20T21:19:48.862296215+00:00 stderr F 2025-05-20T21:19:48.862265147Z INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"} 2025-05-20T21:19:48.871891334+00:00 stderr F 2025-05-20T21:19:48.871826413Z INFO utils-taints out of service taint strategy {"isSupported": true, "k8sMajorVersion": 1, "k8sMinorVersion": 29} 2025-05-20T21:19:48.871891334+00:00 stderr F 2025-05-20T21:19:48.871857561Z INFO utils-taints out of service taint strategy {"isGA": true, "k8sMajorVersion": 1, "k8sMinorVersion": 29} 2025-05-20T21:19:48.871891334+00:00 stderr F 2025-05-20T21:19:48.871864214Z INFO setup Starting as a self node remediation agent that should run as part of the daemonset 2025-05-20T21:19:49.197493327+00:00 stderr F 2025-05-20T21:19:49.197362972Z INFO setup init grpc server 2025-05-20T21:19:49.197493327+00:00 stderr F 2025-05-20T21:19:49.197406564Z INFO setup starting manager 2025-05-20T21:19:49.197977736+00:00 stderr F 2025-05-20T21:19:49.197887938Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"} 2025-05-20T21:19:49.197977736+00:00 stderr F 2025-05-20T21:19:49.197954393Z INFO starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"} 2025-05-20T21:19:49.198708139+00:00 stderr F 2025-05-20T21:19:49.198603583Z INFO watchdog watchdog started 2025-05-20T21:19:49.199068886+00:00 stderr F 2025-05-20T21:19:49.199022579Z INFO Starting EventSource {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "source": "kind source: *v1alpha1.SelfNodeRemediation"} 2025-05-20T21:19:49.199068886+00:00 stderr F 2025-05-20T21:19:49.199053888Z INFO Starting Controller {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"} 2025-05-20T21:19:49.299871388+00:00 stderr F 2025-05-20T21:19:49.299780676Z INFO peers peer starting {"name": "openshift-worker-cygnus-1"} 2025-05-20T21:19:49.300235581+00:00 stderr F 2025-05-20T21:19:49.300207809Z INFO peerhealth.server peer health server started 2025-05-20T21:19:49.407299230+00:00 stderr F 2025-05-20T21:19:49.407188422Z INFO Starting workers {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "worker count": 1} 2025-05-20T21:21:25.014508279+00:00 stderr F 2025-05-20T21:21:25.014421206Z INFO api-check failed to check api server: api server readyz endpoint error: Get "https://172.30.0.1:443/readyz?exclude=shutdown": context deadline exceeded 2025-05-20T21:21:25.014508279+00:00 stderr F 2025-05-20T21:21:25.014450291Z INFO api-check Ignoring api-server error, error count below threshold {"current count": 1, "threshold": 3} 2025-05-20T21:21:25.014508279+00:00 stderr F 2025-05-20T21:21:25.014458196Z INFO api-check peers did not confirm that we are unhealthy, ignoring error 2025-05-20T21:29:00.184111170+00:00 stderr F 2025-05-20T21:29:00.183891248Z INFO api-check failed to check api server: api server readyz endpoint error: Get "https://172.30.0.1:443/readyz?exclude=shutdown": context deadline exceeded 2025-05-20T21:29:00.184111170+00:00 stderr F 2025-05-20T21:29:00.183948886Z INFO api-check Ignoring api-server error, error count below threshold {"current count": 1, "threshold": 3} 2025-05-20T21:29:00.184111170+00:00 stderr F 2025-05-20T21:29:00.183966068Z INFO api-check peers did not confirm that we are unhealthy, ignoring error 2025-05-20T21:29:20.185480876+00:00 stderr F 2025-05-20T21:29:20.18532333Z INFO api-check failed to check api server: api server readyz endpoint error: Get "https://172.30.0.1:443/readyz?exclude=shutdown": context deadline exceeded 2025-05-20T21:29:20.185480876+00:00 stderr F 2025-05-20T21:29:20.185365219Z INFO api-check Ignoring api-server error, error count below threshold {"current count": 2, "threshold": 3} 2025-05-20T21:29:20.185480876+00:00 stderr F 2025-05-20T21:29:20.18537687Z INFO api-check peers did not confirm that we are unhealthy, ignoring error 2025-05-20T21:29:25.184193004+00:00 stderr F W0520 21:29:25.184079 38817 reflector.go:456] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: watch of *v1alpha1.SelfNodeRemediation ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2025-05-20T21:29:25.184193004+00:00 stderr F W0520 21:29:25.184079 38817 reflector.go:456] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2025-05-20T21:29:25.184288693+00:00 stderr F W0520 21:29:25.184230 38817 reflector.go:456] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2025-05-20T21:29:25.184319802+00:00 stderr F W0520 21:29:25.184118 38817 reflector.go:456] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding 2025-05-20T21:29:40.187022023+00:00 stderr F 2025-05-20T21:29:40.186919229Z INFO api-check failed to check api server: api server readyz endpoint error: Get "https://172.30.0.1:443/readyz?exclude=shutdown": context deadline exceeded 2025-05-20T21:29:40.187022023+00:00 stderr F 2025-05-20T21:29:40.186964314Z INFO api-check Error count exceeds threshold, trying to ask other nodes if I'm healthy 2025-05-20T21:29:40.187103346+00:00 stderr F 2025-05-20T21:29:40.18705753Z INFO api-check getting health status from peer {"IP": "10.129.0.5"} 2025-05-20T21:29:40.187441099+00:00 stderr F 2025-05-20T21:29:40.187352223Z INFO api-check getting health status from peer {"IP": "10.128.0.50"} 2025-05-20T21:29:40.187550695+00:00 stderr F 2025-05-20T21:29:40.187512113Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.129.0.5:30001"} 2025-05-20T21:29:40.187601000+00:00 stderr F 2025-05-20T21:29:40.187502555Z INFO api-check getting health status from peer {"IP": "10.130.0.4"} 2025-05-20T21:29:40.187673135+00:00 stderr F 2025-05-20T21:29:40.187647858Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.130.0.4:30001"} 2025-05-20T21:29:40.187738998+00:00 stderr F 2025-05-20T21:29:40.187599667Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.0.50:30001"} 2025-05-20T21:29:42.559635896+00:00 stderr F 2025-05-20T21:29:42.559538023Z INFO peerhealth.server checking health for peer {"node": "openshift-master-cygnus-1", "machine": "cygnus-bp7ps-master-1"} 2025-05-20T21:29:43.218141773+00:00 stderr F 2025-05-20T21:29:43.218040924Z INFO api-check got response from peer {"IP": "10.128.0.50", "status": 3} 2025-05-20T21:29:45.188289842+00:00 stderr F 2025-05-20T21:29:45.188138308Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:29:45.188289842+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:29:45.188289842+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:29:45.188289842+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:29:45.188289842+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:29:45.188289842+00:00 stderr F 2025-05-20T21:29:45.188234849Z ERROR api-check failed to init grpc client {"IP": "10.129.0.5", "error": "context deadline exceeded"} 2025-05-20T21:29:45.188289842+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:29:45.188289842+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:29:45.188289842+00:00 stderr F 2025-05-20T21:29:45.188223308Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:29:45.188289842+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:29:45.188289842+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:29:45.188289842+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:29:45.188289842+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:29:45.188369071+00:00 stderr F 2025-05-20T21:29:45.188275636Z ERROR api-check failed to init grpc client {"IP": "10.130.0.4", "error": "context deadline exceeded"} 2025-05-20T21:29:45.188369071+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:29:45.188369071+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:29:45.188369071+00:00 stderr F 2025-05-20T21:29:45.188293159Z INFO api-check Peer can't access the api-server 2025-05-20T21:29:45.188369071+00:00 stderr F 2025-05-20T21:29:45.188312956Z INFO api-check getting health status from peer {"IP": "10.128.2.11"} 2025-05-20T21:29:45.188369071+00:00 stderr F 2025-05-20T21:29:45.188324978Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.2.11:30001"} 2025-05-20T21:29:45.559842174+00:00 stderr F 2025-05-20T21:29:45.55968528Z ERROR peerhealth.server api error, failed to list snrs {"error": "Get \"https://172.30.0.1:443/apis/self-node-remediation.medik8s.io/v1alpha1/selfnoderemediations\": context deadline exceeded"} 2025-05-20T21:29:45.559842174+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).IsHealthy 2025-05-20T21:29:45.559842174+00:00 stderr F /remote-source/app/pkg/peerhealth/server.go:119 2025-05-20T21:29:45.559842174+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth._PeerHealth_IsHealthy_Handler 2025-05-20T21:29:45.559842174+00:00 stderr F /remote-source/app/pkg/peerhealth/peerhealth_grpc.pb.go:76 2025-05-20T21:29:45.559842174+00:00 stderr F google.golang.org/grpc.(*Server).processUnaryRPC 2025-05-20T21:29:45.559842174+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1335 2025-05-20T21:29:45.559842174+00:00 stderr F google.golang.org/grpc.(*Server).handleStream 2025-05-20T21:29:45.559842174+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1712 2025-05-20T21:29:45.559842174+00:00 stderr F google.golang.org/grpc.(*Server).serveStreams.func1.1 2025-05-20T21:29:45.559842174+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:947 2025-05-20T21:29:46.996637936+00:00 stderr F 2025-05-20T21:29:46.996547596Z INFO peerhealth.server checking health for peer {"node": "openshift-worker-cygnus-0", "machine": "cygnus-bp7ps-worker-0-9p9s8"} 2025-05-20T21:29:48.212461940+00:00 stderr F 2025-05-20T21:29:48.212379565Z INFO api-check got response from peer {"IP": "10.128.2.11", "status": 3} 2025-05-20T21:29:48.212718041+00:00 stderr F 2025-05-20T21:29:48.212660162Z INFO api-check Peer can't access the api-server 2025-05-20T21:29:48.212831404+00:00 stderr F 2025-05-20T21:29:48.212775309Z INFO api-check Ignoring no peers response error, time is below threshold for no peers response {"time without peers response (seconds)": 0.000118142, "threshold (seconds)": 30} 2025-05-20T21:29:48.212915231+00:00 stderr F 2025-05-20T21:29:48.212901485Z INFO api-check peers did not confirm that we are unhealthy, ignoring error 2025-05-20T21:29:49.997390718+00:00 stderr F 2025-05-20T21:29:49.997249634Z ERROR peerhealth.server api error, failed to list snrs {"error": "Get \"https://172.30.0.1:443/apis/self-node-remediation.medik8s.io/v1alpha1/selfnoderemediations\": context deadline exceeded"} 2025-05-20T21:29:49.997390718+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).IsHealthy 2025-05-20T21:29:49.997390718+00:00 stderr F /remote-source/app/pkg/peerhealth/server.go:119 2025-05-20T21:29:49.997390718+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth._PeerHealth_IsHealthy_Handler 2025-05-20T21:29:49.997390718+00:00 stderr F /remote-source/app/pkg/peerhealth/peerhealth_grpc.pb.go:76 2025-05-20T21:29:49.997390718+00:00 stderr F google.golang.org/grpc.(*Server).processUnaryRPC 2025-05-20T21:29:49.997390718+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1335 2025-05-20T21:29:49.997390718+00:00 stderr F google.golang.org/grpc.(*Server).handleStream 2025-05-20T21:29:49.997390718+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1712 2025-05-20T21:29:49.997390718+00:00 stderr F google.golang.org/grpc.(*Server).serveStreams.func1.1 2025-05-20T21:29:49.997390718+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:947 2025-05-20T21:30:03.218568022+00:00 stderr F 2025-05-20T21:30:03.218505103Z INFO api-check failed to check api server: api server readyz endpoint error: an error on the server ("[+]ping ok\n[+]log ok\n[-]etcd failed: reason withheld\n[-]etcd-readiness failed: reason withheld\n[+]api-openshift-apiserver-available ok\n[+]api-openshift-oauth-apiserver-available ok\n[+]informer-sync ok\n[+]poststarthook/openshift.io-oauth-apiserver-reachable ok\n[+]poststarthook/start-kube-apiserver-admission-initializer ok\n[+]poststarthook/quota.openshift.io-clusterquotamapping ok\n[+]poststarthook/openshift.io-api-request-count-filter ok\n[+]poststarthook/openshift.io-startkubeinformers ok\n[+]poststarthook/openshift.io-openshift-apiserver-reachable ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/priority-and-fairness-config-consumer ok\n[+]poststarthook/priority-and-fairness-filter ok\n[+]poststarthook/storage-object-count-tracker-hook ok\n[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok\n[+]poststarthook/crd-informer-synced ok\n[+]poststarthook/start-service-ip-repair-controllers ok\n[+]poststarthook/rbac/bootstrap-roles ok\n[+]poststarthook/scheduling/bootstrap-system-priority-classes ok\n[+]poststarthook/priority-and-fairness-config-producer ok\n[+]poststarthook/start-system-namespaces-controller ok\n[+]poststarthook/bootstrap-controller ok\n[+]poststarthook/start-cluster-authentication-info-controller ok\n[+]poststarthook/start-kube-apiserver-identity-lease-controller ok\n[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok\n[+]poststarthook/start-legacy-token-tracking-controller ok\n[+]poststarthook/aggregator-reload-proxy-client-cert ok\n[+]poststarthook/start-kube-aggregator-informers ok\n[+]poststarthook/apiservice-registration-controller ok\n[+]poststarthook/apiservice-status-available-controller ok\n[+]poststarthook/apiservice-wait-for-first-sync ok\n[+]poststarthook/kube-apiserver-autoregistration ok\n[+]autoregister-completion ok\n[+]poststarthook/apiservice-openapi-controller ok\n[+]poststarthook/apiservice-openapiv3-controller ok\n[+]poststarthook/apiservice-discovery-controller ok\n[+]shutdown excluded: ok\nreadyz check failed") has prevented the request from succeeding 2025-05-20T21:30:03.218643053+00:00 stderr F 2025-05-20T21:30:03.21863103Z INFO api-check Error count exceeds threshold, trying to ask other nodes if I'm healthy 2025-05-20T21:30:03.218709938+00:00 stderr F 2025-05-20T21:30:03.21869507Z INFO api-check getting health status from peer {"IP": "10.129.0.5"} 2025-05-20T21:30:03.218739193+00:00 stderr F 2025-05-20T21:30:03.218729224Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.129.0.5:30001"} 2025-05-20T21:30:03.218902008+00:00 stderr F 2025-05-20T21:30:03.218789748Z INFO api-check getting health status from peer {"IP": "10.128.0.50"} 2025-05-20T21:30:03.218993179+00:00 stderr F 2025-05-20T21:30:03.218978592Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.0.50:30001"} 2025-05-20T21:30:03.219145476+00:00 stderr F 2025-05-20T21:30:03.218786992Z INFO api-check getting health status from peer {"IP": "10.130.0.4"} 2025-05-20T21:30:03.219161326+00:00 stderr F 2025-05-20T21:30:03.219141288Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.130.0.4:30001"} 2025-05-20T21:30:06.240441404+00:00 stderr F 2025-05-20T21:30:06.240377975Z INFO api-check got response from peer {"IP": "10.128.0.50", "status": 3} 2025-05-20T21:30:08.219178448+00:00 stderr F 2025-05-20T21:30:08.219050838Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:30:08.219178448+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:30:08.219178448+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:30:08.219178448+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:08.219178448+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:30:08.219272964+00:00 stderr F 2025-05-20T21:30:08.219249531Z ERROR api-check failed to init grpc client {"IP": "10.129.0.5", "error": "context deadline exceeded"} 2025-05-20T21:30:08.219272964+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:08.219272964+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:30:08.219320804+00:00 stderr F 2025-05-20T21:30:08.219189608Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:30:08.219320804+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:30:08.219320804+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:30:08.219320804+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:08.219320804+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:30:08.219359797+00:00 stderr F 2025-05-20T21:30:08.219341803Z ERROR api-check failed to init grpc client {"IP": "10.130.0.4", "error": "context deadline exceeded"} 2025-05-20T21:30:08.219359797+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:08.219359797+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:30:08.219402928+00:00 stderr F 2025-05-20T21:30:08.219388942Z INFO api-check Peer can't access the api-server 2025-05-20T21:30:08.219457401+00:00 stderr F 2025-05-20T21:30:08.219444556Z INFO api-check getting health status from peer {"IP": "10.128.2.11"} 2025-05-20T21:30:08.219489060+00:00 stderr F 2025-05-20T21:30:08.219477609Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.2.11:30001"} 2025-05-20T21:30:10.584694226+00:00 stderr F 2025-05-20T21:30:10.584633643Z INFO peerhealth.server checking health for peer {"node": "openshift-master-cygnus-1", "machine": "cygnus-bp7ps-master-1"} 2025-05-20T21:30:11.242998523+00:00 stderr F 2025-05-20T21:30:11.24293288Z INFO api-check got response from peer {"IP": "10.128.2.11", "status": 3} 2025-05-20T21:30:11.243153756+00:00 stderr F 2025-05-20T21:30:11.243106858Z INFO api-check Peer can't access the api-server 2025-05-20T21:30:11.243168573+00:00 stderr F 2025-05-20T21:30:11.243139489Z INFO api-check Ignoring no peers response error, time is below threshold for no peers response {"time without peers response (seconds)": 0.000036228, "threshold (seconds)": 30} 2025-05-20T21:30:11.243168573+00:00 stderr F 2025-05-20T21:30:11.243156331Z INFO api-check peers did not confirm that we are unhealthy, ignoring error 2025-05-20T21:30:13.585634941+00:00 stderr F 2025-05-20T21:30:13.585513404Z ERROR peerhealth.server api error, failed to list snrs {"error": "Get \"https://172.30.0.1:443/apis/self-node-remediation.medik8s.io/v1alpha1/selfnoderemediations\": context deadline exceeded"} 2025-05-20T21:30:13.585634941+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).IsHealthy 2025-05-20T21:30:13.585634941+00:00 stderr F /remote-source/app/pkg/peerhealth/server.go:119 2025-05-20T21:30:13.585634941+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth._PeerHealth_IsHealthy_Handler 2025-05-20T21:30:13.585634941+00:00 stderr F /remote-source/app/pkg/peerhealth/peerhealth_grpc.pb.go:76 2025-05-20T21:30:13.585634941+00:00 stderr F google.golang.org/grpc.(*Server).processUnaryRPC 2025-05-20T21:30:13.585634941+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1335 2025-05-20T21:30:13.585634941+00:00 stderr F google.golang.org/grpc.(*Server).handleStream 2025-05-20T21:30:13.585634941+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1712 2025-05-20T21:30:13.585634941+00:00 stderr F google.golang.org/grpc.(*Server).serveStreams.func1.1 2025-05-20T21:30:13.585634941+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:947 2025-05-20T21:30:15.020625295+00:00 stderr F 2025-05-20T21:30:15.020566294Z INFO peerhealth.server checking health for peer {"node": "openshift-worker-cygnus-0", "machine": "cygnus-bp7ps-worker-0-9p9s8"} 2025-05-20T21:30:18.021658845+00:00 stderr F 2025-05-20T21:30:18.02156082Z ERROR peerhealth.server api error, failed to list snrs {"error": "Get \"https://172.30.0.1:443/apis/self-node-remediation.medik8s.io/v1alpha1/selfnoderemediations\": context deadline exceeded"} 2025-05-20T21:30:18.021658845+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).IsHealthy 2025-05-20T21:30:18.021658845+00:00 stderr F /remote-source/app/pkg/peerhealth/server.go:119 2025-05-20T21:30:18.021658845+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth._PeerHealth_IsHealthy_Handler 2025-05-20T21:30:18.021658845+00:00 stderr F /remote-source/app/pkg/peerhealth/peerhealth_grpc.pb.go:76 2025-05-20T21:30:18.021658845+00:00 stderr F google.golang.org/grpc.(*Server).processUnaryRPC 2025-05-20T21:30:18.021658845+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1335 2025-05-20T21:30:18.021658845+00:00 stderr F google.golang.org/grpc.(*Server).handleStream 2025-05-20T21:30:18.021658845+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1712 2025-05-20T21:30:18.021658845+00:00 stderr F google.golang.org/grpc.(*Server).serveStreams.func1.1 2025-05-20T21:30:18.021658845+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:947 2025-05-20T21:30:26.035012422+00:00 stderr F W0520 21:30:26.034953 38817 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.Pod: the server was unable to return a response in the time allotted, but may still be processing the request (get pods) 2025-05-20T21:30:26.035157515+00:00 stderr F I0520 21:30:26.035139 38817 trace.go:219] Trace[1873602390]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 (20-May-2025 21:29:25.986) (total time: 60048ms): 2025-05-20T21:30:26.035157515+00:00 stderr F Trace[1873602390]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get pods) 60048ms (21:30:26.034) 2025-05-20T21:30:26.035157515+00:00 stderr F Trace[1873602390]: [1m0.048705097s] [1m0.048705097s] END 2025-05-20T21:30:26.035298309+00:00 stderr F W0520 21:30:26.034952 38817 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.Node: the server was unable to return a response in the time allotted, but may still be processing the request (get nodes) 2025-05-20T21:30:26.035417273+00:00 stderr F I0520 21:30:26.035401 38817 trace.go:219] Trace[1725416836]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 (20-May-2025 21:29:26.023) (total time: 60011ms): 2025-05-20T21:30:26.035417273+00:00 stderr F Trace[1725416836]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get nodes) 60011ms (21:30:26.034) 2025-05-20T21:30:26.035417273+00:00 stderr F Trace[1725416836]: [1m0.01160506s] [1m0.01160506s] END 2025-05-20T21:30:26.035460694+00:00 stderr F E0520 21:30:26.035450 38817 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.Node: failed to list *v1.Node: the server was unable to return a response in the time allotted, but may still be processing the request (get nodes) 2025-05-20T21:30:26.035495990+00:00 stderr F E0520 21:30:26.035270 38817 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.Pod: failed to list *v1.Pod: the server was unable to return a response in the time allotted, but may still be processing the request (get pods) 2025-05-20T21:30:26.053969730+00:00 stderr F W0520 21:30:26.053944 38817 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1alpha1.SelfNodeRemediation: the server was unable to return a response in the time allotted, but may still be processing the request (get selfnoderemediations.self-node-remediation.medik8s.io) 2025-05-20T21:30:26.054037037+00:00 stderr F I0520 21:30:26.054025 38817 trace.go:219] Trace[1758090652]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 (20-May-2025 21:29:26.051) (total time: 60002ms): 2025-05-20T21:30:26.054037037+00:00 stderr F Trace[1758090652]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get selfnoderemediations.self-node-remediation.medik8s.io) 60002ms (21:30:26.053) 2025-05-20T21:30:26.054037037+00:00 stderr F Trace[1758090652]: [1m0.002556362s] [1m0.002556362s] END 2025-05-20T21:30:26.054069818+00:00 stderr F E0520 21:30:26.054059 38817 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1alpha1.SelfNodeRemediation: failed to list *v1alpha1.SelfNodeRemediation: the server was unable to return a response in the time allotted, but may still be processing the request (get selfnoderemediations.self-node-remediation.medik8s.io) 2025-05-20T21:30:26.248518104+00:00 stderr F 2025-05-20T21:30:26.248448083Z INFO api-check failed to check api server: api server readyz endpoint error: an error on the server ("[+]ping ok\n[+]log ok\n[-]etcd failed: reason withheld\n[-]etcd-readiness failed: reason withheld\n[+]api-openshift-apiserver-available ok\n[+]api-openshift-oauth-apiserver-available ok\n[+]informer-sync ok\n[+]poststarthook/openshift.io-oauth-apiserver-reachable ok\n[+]poststarthook/start-kube-apiserver-admission-initializer ok\n[+]poststarthook/quota.openshift.io-clusterquotamapping ok\n[+]poststarthook/openshift.io-api-request-count-filter ok\n[+]poststarthook/openshift.io-startkubeinformers ok\n[+]poststarthook/openshift.io-openshift-apiserver-reachable ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/priority-and-fairness-config-consumer ok\n[+]poststarthook/priority-and-fairness-filter ok\n[+]poststarthook/storage-object-count-tracker-hook ok\n[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok\n[+]poststarthook/crd-informer-synced ok\n[+]poststarthook/start-service-ip-repair-controllers ok\n[+]poststarthook/rbac/bootstrap-roles ok\n[+]poststarthook/scheduling/bootstrap-system-priority-classes ok\n[+]poststarthook/priority-and-fairness-config-producer ok\n[+]poststarthook/start-system-namespaces-controller ok\n[+]poststarthook/bootstrap-controller ok\n[+]poststarthook/start-cluster-authentication-info-controller ok\n[+]poststarthook/start-kube-apiserver-identity-lease-controller ok\n[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok\n[+]poststarthook/start-legacy-token-tracking-controller ok\n[+]poststarthook/aggregator-reload-proxy-client-cert ok\n[+]poststarthook/start-kube-aggregator-informers ok\n[+]poststarthook/apiservice-registration-controller ok\n[+]poststarthook/apiservice-status-available-controller ok\n[+]poststarthook/apiservice-wait-for-first-sync ok\n[+]poststarthook/kube-apiserver-autoregistration ok\n[+]autoregister-completion ok\n[+]poststarthook/apiservice-openapi-controller ok\n[+]poststarthook/apiservice-openapiv3-controller ok\n[+]poststarthook/apiservice-discovery-controller ok\n[+]shutdown excluded: ok\nreadyz check failed") has prevented the request from succeeding 2025-05-20T21:30:26.248625035+00:00 stderr F 2025-05-20T21:30:26.248608715Z INFO api-check Error count exceeds threshold, trying to ask other nodes if I'm healthy 2025-05-20T21:30:26.248701890+00:00 stderr F 2025-05-20T21:30:26.248685198Z INFO api-check getting health status from peer {"IP": "10.129.0.5"} 2025-05-20T21:30:26.248730273+00:00 stderr F 2025-05-20T21:30:26.248720775Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.129.0.5:30001"} 2025-05-20T21:30:26.248795305+00:00 stderr F 2025-05-20T21:30:26.248731786Z INFO api-check getting health status from peer {"IP": "10.128.0.50"} 2025-05-20T21:30:26.248922964+00:00 stderr F 2025-05-20T21:30:26.248908046Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.0.50:30001"} 2025-05-20T21:30:26.249070200+00:00 stderr F 2025-05-20T21:30:26.248725003Z INFO api-check getting health status from peer {"IP": "10.130.0.4"} 2025-05-20T21:30:26.249135043+00:00 stderr F 2025-05-20T21:30:26.249116448Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.130.0.4:30001"} 2025-05-20T21:30:26.650724082+00:00 stderr F W0520 21:30:26.650625 38817 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.Secret: the server was unable to return a response in the time allotted, but may still be processing the request (get secrets) 2025-05-20T21:30:26.650957069+00:00 stderr F I0520 21:30:26.650938 38817 trace.go:219] Trace[1031714297]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 (20-May-2025 21:29:26.648) (total time: 60002ms): 2025-05-20T21:30:26.650957069+00:00 stderr F Trace[1031714297]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get secrets) 60002ms (21:30:26.650) 2025-05-20T21:30:26.650957069+00:00 stderr F Trace[1031714297]: [1m0.00249602s] [1m0.00249602s] END 2025-05-20T21:30:26.651186630+00:00 stderr F E0520 21:30:26.651012 38817 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.Secret: failed to list *v1.Secret: the server was unable to return a response in the time allotted, but may still be processing the request (get secrets) 2025-05-20T21:30:29.272497159+00:00 stderr F 2025-05-20T21:30:29.272429791Z INFO api-check got response from peer {"IP": "10.128.0.50", "status": 3} 2025-05-20T21:30:31.249511128+00:00 stderr F 2025-05-20T21:30:31.249412223Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:30:31.249511128+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:30:31.249511128+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:30:31.249511128+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:31.249511128+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:30:31.249615424+00:00 stderr F 2025-05-20T21:30:31.249595606Z ERROR api-check failed to init grpc client {"IP": "10.130.0.4", "error": "context deadline exceeded"} 2025-05-20T21:30:31.249615424+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:31.249615424+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:30:31.249644979+00:00 stderr F 2025-05-20T21:30:31.249481823Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:30:31.249644979+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:30:31.249644979+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:30:31.249644979+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:31.249644979+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:30:31.249675657+00:00 stderr F 2025-05-20T21:30:31.249660218Z ERROR api-check failed to init grpc client {"IP": "10.129.0.5", "error": "context deadline exceeded"} 2025-05-20T21:30:31.249675657+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:31.249675657+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:30:31.249705663+00:00 stderr F 2025-05-20T21:30:31.249696185Z INFO api-check Peer can't access the api-server 2025-05-20T21:30:31.249754225+00:00 stderr F 2025-05-20T21:30:31.249743203Z INFO api-check getting health status from peer {"IP": "10.128.2.11"} 2025-05-20T21:30:31.249781947+00:00 stderr F 2025-05-20T21:30:31.24977264Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.2.11:30001"} 2025-05-20T21:30:34.271731181+00:00 stderr F 2025-05-20T21:30:34.271674315Z INFO api-check got response from peer {"IP": "10.128.2.11", "status": 3} 2025-05-20T21:30:34.271953299+00:00 stderr F 2025-05-20T21:30:34.271924745Z INFO api-check Peer can't access the api-server 2025-05-20T21:30:34.272019974+00:00 stderr F 2025-05-20T21:30:34.271995678Z INFO api-check Ignoring no peers response error, time is below threshold for no peers response {"time without peers response (seconds)": 0.000071544, "threshold (seconds)": 30} 2025-05-20T21:30:34.272048688+00:00 stderr F 2025-05-20T21:30:34.27203919Z INFO api-check peers did not confirm that we are unhealthy, ignoring error 2025-05-20T21:30:38.047146848+00:00 stderr F 2025-05-20T21:30:38.047069072Z INFO peerhealth.server checking health for peer {"node": "openshift-worker-cygnus-0", "machine": "cygnus-bp7ps-worker-0-9p9s8"} 2025-05-20T21:30:38.610064798+00:00 stderr F 2025-05-20T21:30:38.610007301Z INFO peerhealth.server checking health for peer {"node": "openshift-master-cygnus-1", "machine": "cygnus-bp7ps-master-1"} 2025-05-20T21:30:41.047643633+00:00 stderr F 2025-05-20T21:30:41.047505805Z ERROR peerhealth.server api error, failed to list snrs {"error": "Get \"https://172.30.0.1:443/apis/self-node-remediation.medik8s.io/v1alpha1/selfnoderemediations\": context deadline exceeded"} 2025-05-20T21:30:41.047643633+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).IsHealthy 2025-05-20T21:30:41.047643633+00:00 stderr F /remote-source/app/pkg/peerhealth/server.go:119 2025-05-20T21:30:41.047643633+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth._PeerHealth_IsHealthy_Handler 2025-05-20T21:30:41.047643633+00:00 stderr F /remote-source/app/pkg/peerhealth/peerhealth_grpc.pb.go:76 2025-05-20T21:30:41.047643633+00:00 stderr F google.golang.org/grpc.(*Server).processUnaryRPC 2025-05-20T21:30:41.047643633+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1335 2025-05-20T21:30:41.047643633+00:00 stderr F google.golang.org/grpc.(*Server).handleStream 2025-05-20T21:30:41.047643633+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1712 2025-05-20T21:30:41.047643633+00:00 stderr F google.golang.org/grpc.(*Server).serveStreams.func1.1 2025-05-20T21:30:41.047643633+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:947 2025-05-20T21:30:43.611403102+00:00 stderr F 2025-05-20T21:30:43.611303426Z ERROR peerhealth.server api error, failed to list snrs {"error": "client rate limiter Wait returned an error: context deadline exceeded"} 2025-05-20T21:30:43.611403102+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).IsHealthy 2025-05-20T21:30:43.611403102+00:00 stderr F /remote-source/app/pkg/peerhealth/server.go:119 2025-05-20T21:30:43.611403102+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth._PeerHealth_IsHealthy_Handler 2025-05-20T21:30:43.611403102+00:00 stderr F /remote-source/app/pkg/peerhealth/peerhealth_grpc.pb.go:76 2025-05-20T21:30:43.611403102+00:00 stderr F google.golang.org/grpc.(*Server).processUnaryRPC 2025-05-20T21:30:43.611403102+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1335 2025-05-20T21:30:43.611403102+00:00 stderr F google.golang.org/grpc.(*Server).handleStream 2025-05-20T21:30:43.611403102+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1712 2025-05-20T21:30:43.611403102+00:00 stderr F google.golang.org/grpc.(*Server).serveStreams.func1.1 2025-05-20T21:30:43.611403102+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:947 2025-05-20T21:30:54.272378794+00:00 stderr F 2025-05-20T21:30:54.272326776Z INFO api-check failed to check api server: api server readyz endpoint error: Get "https://172.30.0.1:443/readyz?exclude=shutdown": context deadline exceeded 2025-05-20T21:30:54.272452823+00:00 stderr F 2025-05-20T21:30:54.272440531Z INFO api-check Error count exceeds threshold, trying to ask other nodes if I'm healthy 2025-05-20T21:30:54.272541811+00:00 stderr F 2025-05-20T21:30:54.272526792Z INFO api-check getting health status from peer {"IP": "10.129.0.5"} 2025-05-20T21:30:54.272577507+00:00 stderr F 2025-05-20T21:30:54.272567318Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.129.0.5:30001"} 2025-05-20T21:30:54.272707232+00:00 stderr F 2025-05-20T21:30:54.272543684Z INFO api-check getting health status from peer {"IP": "10.128.0.50"} 2025-05-20T21:30:54.272767775+00:00 stderr F 2025-05-20T21:30:54.272751024Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.0.50:30001"} 2025-05-20T21:30:54.272835262+00:00 stderr F 2025-05-20T21:30:54.272527814Z INFO api-check getting health status from peer {"IP": "10.130.0.4"} 2025-05-20T21:30:54.272973831+00:00 stderr F 2025-05-20T21:30:54.272953042Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.130.0.4:30001"} 2025-05-20T21:30:59.273689658+00:00 stderr F 2025-05-20T21:30:59.273616702Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:30:59.273689658+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:30:59.273689658+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:30:59.273689658+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:59.273689658+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:30:59.273772413+00:00 stderr F 2025-05-20T21:30:59.273755221Z ERROR api-check failed to init grpc client {"IP": "10.129.0.5", "error": "context deadline exceeded"} 2025-05-20T21:30:59.273772413+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:59.273772413+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:30:59.273837476+00:00 stderr F 2025-05-20T21:30:59.27365322Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:30:59.273837476+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:30:59.273837476+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:30:59.273837476+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:59.273837476+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:30:59.273946121+00:00 stderr F 2025-05-20T21:30:59.273915352Z ERROR api-check failed to init grpc client {"IP": "10.130.0.4", "error": "context deadline exceeded"} 2025-05-20T21:30:59.273946121+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:59.273946121+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:30:59.292018320+00:00 stderr F 2025-05-20T21:30:59.291962476Z ERROR api-check failed to read health response from peer {"IP": "10.128.0.50", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded"} 2025-05-20T21:30:59.292018320+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:30:59.292018320+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:269 2025-05-20T21:30:59.292162942+00:00 stderr F 2025-05-20T21:30:59.292138145Z INFO api-check getting health status from peer {"IP": "10.128.2.11"} 2025-05-20T21:30:59.292207997+00:00 stderr F 2025-05-20T21:30:59.29219372Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.2.11:30001"} 2025-05-20T21:31:01.097712373+00:00 stderr F 2025-05-20T21:31:01.097650206Z INFO peerhealth.server checking health for peer {"node": "openshift-worker-cygnus-0", "machine": "cygnus-bp7ps-worker-0-9p9s8"} 2025-05-20T21:31:04.315138024+00:00 stderr F 2025-05-20T21:31:04.315058284Z ERROR api-check failed to read health response from peer {"IP": "10.128.2.11", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded"} 2025-05-20T21:31:04.315138024+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:31:04.315138024+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:269 2025-05-20T21:31:04.315335404+00:00 stderr F 2025-05-20T21:31:04.315237069Z ERROR api-check Failed to get health status peers. Assuming unhealthy {"error": "failed health check"} 2025-05-20T21:31:04.315335404+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getWorkerPeersResponse 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:178 2025-05-20T21:31:04.315335404+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).isConsideredHealthy 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:113 2025-05-20T21:31:04.315335404+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).Start.func1 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:90 2025-05-20T21:31:04.315335404+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 2025-05-20T21:31:04.315335404+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 2025-05-20T21:31:04.315335404+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.BackoffUntil 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 2025-05-20T21:31:04.315335404+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntil 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 2025-05-20T21:31:04.315335404+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 2025-05-20T21:31:04.315335404+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.UntilWithContext 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:170 2025-05-20T21:31:04.315335404+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).Start 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:72 2025-05-20T21:31:04.315335404+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1 2025-05-20T21:31:04.315335404+00:00 stderr F /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:219 2025-05-20T21:31:04.315444018+00:00 stderr F 2025-05-20T21:31:04.315390898Z ERROR api-check we are unhealthy, triggering a reboot 2025-05-20T21:31:04.315444018+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).Start.func1 2025-05-20T21:31:04.315444018+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:92 2025-05-20T21:31:04.315444018+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1 2025-05-20T21:31:04.315444018+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 2025-05-20T21:31:04.315444018+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 2025-05-20T21:31:04.315444018+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 2025-05-20T21:31:04.315444018+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.BackoffUntil 2025-05-20T21:31:04.315444018+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 2025-05-20T21:31:04.315444018+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntil 2025-05-20T21:31:04.315444018+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 2025-05-20T21:31:04.315444018+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext 2025-05-20T21:31:04.315444018+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:259 2025-05-20T21:31:04.315444018+00:00 stderr F k8s.io/apimachinery/pkg/util/wait.UntilWithContext 2025-05-20T21:31:04.315444018+00:00 stderr F /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:170 2025-05-20T21:31:04.315444018+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).Start 2025-05-20T21:31:04.315444018+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:72 2025-05-20T21:31:04.315444018+00:00 stderr F sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1 2025-05-20T21:31:04.315444018+00:00 stderr F /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:219 2025-05-20T21:31:04.315494753+00:00 stderr F 2025-05-20T21:31:04.315481639Z INFO rebooter watchdog feeding has stopped, waiting for reboot to commence 2025-05-20T21:31:06.098702024+00:00 stderr F 2025-05-20T21:31:06.098609449Z ERROR peerhealth.server api error, failed to list snrs {"error": "client rate limiter Wait returned an error: context deadline exceeded"} 2025-05-20T21:31:06.098702024+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).IsHealthy 2025-05-20T21:31:06.098702024+00:00 stderr F /remote-source/app/pkg/peerhealth/server.go:119 2025-05-20T21:31:06.098702024+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth._PeerHealth_IsHealthy_Handler 2025-05-20T21:31:06.098702024+00:00 stderr F /remote-source/app/pkg/peerhealth/peerhealth_grpc.pb.go:76 2025-05-20T21:31:06.098702024+00:00 stderr F google.golang.org/grpc.(*Server).processUnaryRPC 2025-05-20T21:31:06.098702024+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1335 2025-05-20T21:31:06.098702024+00:00 stderr F google.golang.org/grpc.(*Server).handleStream 2025-05-20T21:31:06.098702024+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1712 2025-05-20T21:31:06.098702024+00:00 stderr F google.golang.org/grpc.(*Server).serveStreams.func1.1 2025-05-20T21:31:06.098702024+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:947 2025-05-20T21:31:13.633054997+00:00 stderr F 2025-05-20T21:31:13.632992901Z INFO peerhealth.server checking health for peer {"node": "openshift-master-cygnus-1", "machine": "cygnus-bp7ps-master-1"} 2025-05-20T21:31:16.634061050+00:00 stderr F 2025-05-20T21:31:16.633959369Z ERROR peerhealth.server api error, failed to list snrs {"error": "Get \"https://172.30.0.1:443/apis/self-node-remediation.medik8s.io/v1alpha1/selfnoderemediations\": context deadline exceeded"} 2025-05-20T21:31:16.634061050+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.(*Server).IsHealthy 2025-05-20T21:31:16.634061050+00:00 stderr F /remote-source/app/pkg/peerhealth/server.go:119 2025-05-20T21:31:16.634061050+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth._PeerHealth_IsHealthy_Handler 2025-05-20T21:31:16.634061050+00:00 stderr F /remote-source/app/pkg/peerhealth/peerhealth_grpc.pb.go:76 2025-05-20T21:31:16.634061050+00:00 stderr F google.golang.org/grpc.(*Server).processUnaryRPC 2025-05-20T21:31:16.634061050+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1335 2025-05-20T21:31:16.634061050+00:00 stderr F google.golang.org/grpc.(*Server).handleStream 2025-05-20T21:31:16.634061050+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:1712 2025-05-20T21:31:16.634061050+00:00 stderr F google.golang.org/grpc.(*Server).serveStreams.func1.1 2025-05-20T21:31:16.634061050+00:00 stderr F /remote-source/app/vendor/google.golang.org/grpc/server.go:947 2025-05-20T21:31:24.316454772+00:00 stderr F 2025-05-20T21:31:24.316382306Z INFO api-check failed to check api server: api server readyz endpoint error: Get "https://172.30.0.1:443/readyz?exclude=shutdown": context deadline exceeded 2025-05-20T21:31:24.316540503+00:00 stderr F 2025-05-20T21:31:24.316528852Z INFO api-check Error count exceeds threshold, trying to ask other nodes if I'm healthy 2025-05-20T21:31:24.316604684+00:00 stderr F 2025-05-20T21:31:24.316591971Z INFO api-check getting health status from peer {"IP": "10.129.0.5"} 2025-05-20T21:31:24.316631064+00:00 stderr F 2025-05-20T21:31:24.316620795Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.129.0.5:30001"} 2025-05-20T21:31:24.316763192+00:00 stderr F 2025-05-20T21:31:24.316682961Z INFO api-check getting health status from peer {"IP": "10.130.0.4"} 2025-05-20T21:31:24.316890050+00:00 stderr F 2025-05-20T21:31:24.316872888Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.130.0.4:30001"} 2025-05-20T21:31:24.317053788+00:00 stderr F 2025-05-20T21:31:24.316729178Z INFO api-check getting health status from peer {"IP": "10.128.0.50"} 2025-05-20T21:31:24.317161249+00:00 stderr F 2025-05-20T21:31:24.317143797Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.0.50:30001"} 2025-05-20T21:31:28.371184808+00:00 stderr F W0520 21:31:28.371140 38817 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.Node: the server was unable to return a response in the time allotted, but may still be processing the request (get nodes) 2025-05-20T21:31:28.371340220+00:00 stderr F I0520 21:31:28.371311 38817 trace.go:219] Trace[273690126]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 (20-May-2025 21:30:28.369) (total time: 60002ms): 2025-05-20T21:31:28.371340220+00:00 stderr F Trace[273690126]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get nodes) 60002ms (21:31:28.371) 2025-05-20T21:31:28.371340220+00:00 stderr F Trace[273690126]: [1m0.002220573s] [1m0.002220573s] END 2025-05-20T21:31:28.371414399+00:00 stderr F E0520 21:31:28.371403 38817 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.Node: failed to list *v1.Node: the server was unable to return a response in the time allotted, but may still be processing the request (get nodes) 2025-05-20T21:31:28.693642852+00:00 stderr F W0520 21:31:28.693578 38817 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1alpha1.SelfNodeRemediation: the server was unable to return a response in the time allotted, but may still be processing the request (get selfnoderemediations.self-node-remediation.medik8s.io) 2025-05-20T21:31:28.693783015+00:00 stderr F I0520 21:31:28.693768 38817 trace.go:219] Trace[1992137895]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 (20-May-2025 21:30:28.692) (total time: 60001ms): 2025-05-20T21:31:28.693783015+00:00 stderr F Trace[1992137895]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get selfnoderemediations.self-node-remediation.medik8s.io) 60001ms (21:31:28.693) 2025-05-20T21:31:28.693783015+00:00 stderr F Trace[1992137895]: [1m0.001238891s] [1m0.001238891s] END 2025-05-20T21:31:28.693856012+00:00 stderr F E0520 21:31:28.693837 38817 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1alpha1.SelfNodeRemediation: failed to list *v1alpha1.SelfNodeRemediation: the server was unable to return a response in the time allotted, but may still be processing the request (get selfnoderemediations.self-node-remediation.medik8s.io) 2025-05-20T21:31:28.967067470+00:00 stderr F W0520 21:31:28.967020 38817 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.Pod: the server was unable to return a response in the time allotted, but may still be processing the request (get pods) 2025-05-20T21:31:28.967248800+00:00 stderr F I0520 21:31:28.967228 38817 trace.go:219] Trace[1428147334]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 (20-May-2025 21:30:28.965) (total time: 60001ms): 2025-05-20T21:31:28.967248800+00:00 stderr F Trace[1428147334]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get pods) 60001ms (21:31:28.967) 2025-05-20T21:31:28.967248800+00:00 stderr F Trace[1428147334]: [1m0.00127529s] [1m0.00127529s] END 2025-05-20T21:31:28.967328469+00:00 stderr F E0520 21:31:28.967314 38817 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.Pod: failed to list *v1.Pod: the server was unable to return a response in the time allotted, but may still be processing the request (get pods) 2025-05-20T21:31:29.317842006+00:00 stderr F 2025-05-20T21:31:29.317735596Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:31:29.317842006+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:31:29.317842006+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:31:29.317842006+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:31:29.317842006+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:31:29.317842006+00:00 stderr F 2025-05-20T21:31:29.317794055Z ERROR api-check failed to init grpc client {"IP": "10.130.0.4", "error": "context deadline exceeded"} 2025-05-20T21:31:29.317842006+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:31:29.317842006+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:31:29.317916134+00:00 stderr F 2025-05-20T21:31:29.317791982Z ERROR api-check.peerhealth client failed to dial {"error": "context deadline exceeded"} 2025-05-20T21:31:29.317916134+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/peerhealth.NewClient 2025-05-20T21:31:29.317916134+00:00 stderr F /remote-source/app/pkg/peerhealth/client.go:37 2025-05-20T21:31:29.317916134+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:31:29.317916134+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:253 2025-05-20T21:31:29.317916134+00:00 stderr F 2025-05-20T21:31:29.317848898Z ERROR api-check failed to init grpc client {"IP": "10.129.0.5", "error": "context deadline exceeded"} 2025-05-20T21:31:29.317916134+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:31:29.317916134+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:255 2025-05-20T21:31:29.337276474+00:00 stderr F 2025-05-20T21:31:29.337182488Z ERROR api-check failed to read health response from peer {"IP": "10.128.0.50", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded"} 2025-05-20T21:31:29.337276474+00:00 stderr F github.com/medik8s/self-node-remediation/pkg/apicheck.(*ApiConnectivityCheck).getHealthStatusFromPeer 2025-05-20T21:31:29.337276474+00:00 stderr F /remote-source/app/pkg/apicheck/check.go:269 2025-05-20T21:31:29.337331458+00:00 stderr F 2025-05-20T21:31:29.33728527Z INFO api-check getting health status from peer {"IP": "10.128.2.11"} 2025-05-20T21:31:29.337331458+00:00 stderr F 2025-05-20T21:31:29.337294067Z INFO api-check.peerhealth client new peer client {"serveraddr": "10.128.2.11:30001"} 2025-05-20T21:31:29.513132122+00:00 stderr F W0520 21:31:29.513058 38817 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.Secret: the server was unable to return a response in the time allotted, but may still be processing the request (get secrets) 2025-05-20T21:31:29.513132122+00:00 stderr F I0520 21:31:29.513109 38817 trace.go:219] Trace[1587077360]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233 (20-May-2025 21:30:29.511) (total time: 60001ms): 2025-05-20T21:31:29.513132122+00:00 stderr F Trace[1587077360]: ---"Objects listed" error:the server was unable to return a response in the time allotted, but may still be processing the request (get secrets) 60001ms (21:31:29.513) 2025-05-20T21:31:29.513132122+00:00 stderr F Trace[1587077360]: [1m0.00128477s] [1m0.00128477s] END 2025-05-20T21:31:29.513132122+00:00 stderr F E0520 21:31:29.513124 38817 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.Secret: failed to list *v1.Secret: the server was unable to return a response in the time allotted, but may still be processing the request (get secrets)