-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.12.z
-
Quality / Stability / Reliability
-
False
-
-
3
-
Moderate
-
No
-
-
None
-
None
-
OCP VE Sprint 239, OCP VE Sprint 240, OCP VE Sprint 241, OCP VE Sprint 242, OCP VE Sprint 243, OCP VE Sprint 244
-
6
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
Description of problem:
In SNO scenario, if kube-apiserver restarted, topolvm-controller and lvm-operator Pod will crash because leader election failed
Version-Release number of selected component (if applicable):
OCP 4.12
How reproducible:
100%
Steps to Reproduce:
1. Kill kube-apiserver in SNO
$ oc exec -it -n openshift-kube-apiserver kube-apiserver-XXXXXX -c kube-apiserver -- /bin/sh -c "kill 1"
2. Watch the topolvm-controller Pod and lvm-operator Pod
3. Check the logs before the crash:
2023-04-06T17:45:52.566409204+08:00 stderr F E0406 09:45:52.566357 1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": dial tcp 172.30.0.1:443: connect: connection refused
2023-04-06T17:45:54.566987764+08:00 stderr F E0406 09:45:54.566945 1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": dial tcp 172.30.0.1:443: connect: connection refused
2023-04-06T17:45:56.567006728+08:00 stderr F E0406 09:45:56.566959 1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": dial tcp 172.30.0.1:443: connect: connection refused
2023-04-06T17:46:02.570057660+08:00 stdout F leader election lost
2023-04-06T17:46:02.570087514+08:00 stderr F E0406 09:46:02.565764 1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": context deadline exceeded
2023-04-06T17:46:02.570087514+08:00 stderr F I0406 09:46:02.565823 1 leaderelection.go:283] failed to renew lease openshift-storage/topolvm: timed out waiting for the condition
2023-04-06T17:46:02.570087514+08:00 stderr F {"level":"error","ts":1680774362.5658588,"logger":"setup","msg":"problem running manager","error":"leader election lost","stacktrace":"github.com/topolvm/topolvm/pkg/topolvm-controller/cmd.subMain\n\t/remote-source/app/pkg/topolvm-controller/cmd/run.go:145\ngithub.com/topolvm/topolvm/pkg/topolvm-controller/cmd.glob..func1\n\t/remote-source/app/pkg/topolvm-controller/cmd/root.go:34\ngithub.com/spf13/cobra.(*Command).execute\n\t/remote-source/deps/gomod/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/remote-source/deps/gomod/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/remote-source/deps/gomod/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902\ngithub.com/topolvm/topolvm/pkg/topolvm-controller/cmd.Execute\n\t/remote-source/app/pkg/topolvm-controller/cmd/root.go:41\nmain.main\n\t/remote-source/app/pkg/hypertopolvm/main.go:44\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
2023-04-06T17:46:02.570087514+08:00 stderr F Error: leader election lost
Actual results:
The said Pods will crash and restart due to short leaseDurationSeconds
Expected results:
The said Pods should survive
Additional info:
- links to
-
RHBA-2024:126443
LVMS 4.15 Bug Fix and Enhancement update
- mentioned on
(2 links to, 1 mentioned on)