Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.15, 4.15.0
Affects Version/s: 4.12.z
Component/s: Logical Volume Manager Storage
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
Moderate
Regression:
No
Latest Status Summary:

Hide
8/22: verification pending
8/10: All Pull Requests for Extension merged, now waiting for QA Verification
8/8: Upstream TopoLVM PR merged. Downstream TopoLVM rebased. Finalizing the plumbing to connect LVMS to TopoLVM and enable proper leader election.
7/4: the associated case is closed but the customer wants to track the resolution. This requires a forecast.
6/22: the case related to this bug is closed; unlinked this bug from the case. This bug is still needed to address the issue if needed.

Show
8/22: verification pending 8/10: All Pull Requests for Extension merged, now waiting for QA Verification 8/8: Upstream TopoLVM PR merged. Downstream TopoLVM rebased. Finalizing the plumbing to connect LVMS to TopoLVM and enable proper leader election. 7/4: the associated case is closed but the customer wants to track the resolution. This requires a forecast. 6/22: the case related to this bug is closed; unlinked this bug from the case. This bug is still needed to address the issue if needed.

Target Backport Versions:
None
Target Version:

4.15.0
Release Blocker:
None
Sprint:
OCP VE Sprint 239, OCP VE Sprint 240, OCP VE Sprint 241, OCP VE Sprint 242, OCP VE Sprint 243, OCP VE Sprint 244
sprint_count:
6

Internal Whiteboard:
RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Priority Data:
PX Impact Score:

Release Note Status:
In Progress
Release Note Type:
Release Note Not Required
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

In SNO scenario, if kube-apiserver restarted, topolvm-controller and lvm-operator Pod will crash because leader election failed

Version-Release number of selected component (if applicable):

OCP 4.12

How reproducible:

100%

Steps to Reproduce:

1. Kill kube-apiserver in SNO
$ oc exec -it -n openshift-kube-apiserver kube-apiserver-XXXXXX  -c kube-apiserver -- /bin/sh -c "kill 1" 
2. Watch the topolvm-controller Pod and lvm-operator Pod
3. Check the logs before the crash:
2023-04-06T17:45:52.566409204+08:00 stderr F E0406 09:45:52.566357       1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": dial tcp 172.30.0.1:443: connect: connection refused
2023-04-06T17:45:54.566987764+08:00 stderr F E0406 09:45:54.566945       1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": dial tcp 172.30.0.1:443: connect: connection refused
2023-04-06T17:45:56.567006728+08:00 stderr F E0406 09:45:56.566959       1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": dial tcp 172.30.0.1:443: connect: connection refused
2023-04-06T17:46:02.570057660+08:00 stdout F leader election lost
2023-04-06T17:46:02.570087514+08:00 stderr F E0406 09:46:02.565764       1 leaderelection.go:330] error retrieving resource lock openshift-storage/topolvm: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-storage/leases/topolvm": context deadline exceeded
2023-04-06T17:46:02.570087514+08:00 stderr F I0406 09:46:02.565823       1 leaderelection.go:283] failed to renew lease openshift-storage/topolvm: timed out waiting for the condition
2023-04-06T17:46:02.570087514+08:00 stderr F {"level":"error","ts":1680774362.5658588,"logger":"setup","msg":"problem running manager","error":"leader election lost","stacktrace":"github.com/topolvm/topolvm/pkg/topolvm-controller/cmd.subMain\n\t/remote-source/app/pkg/topolvm-controller/cmd/run.go:145\ngithub.com/topolvm/topolvm/pkg/topolvm-controller/cmd.glob..func1\n\t/remote-source/app/pkg/topolvm-controller/cmd/root.go:34\ngithub.com/spf13/cobra.(*Command).execute\n\t/remote-source/deps/gomod/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/remote-source/deps/gomod/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/remote-source/deps/gomod/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902\ngithub.com/topolvm/topolvm/pkg/topolvm-controller/cmd.Execute\n\t/remote-source/app/pkg/topolvm-controller/cmd/root.go:41\nmain.main\n\t/remote-source/app/pkg/hypertopolvm/main.go:44\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
2023-04-06T17:46:02.570087514+08:00 stderr F Error: leader election lost

Actual results:

The said Pods will crash and restart due to short leaseDurationSeconds

Expected results:

The said Pods should survive

Additional info:

links to

feat: allow passing leaderelection config values to topolvm-controller

openshift/lvm-operator#370: OCPBUGS-11486: fix: introduce correct SNO leader election for openshift

openshift/lvm-operator#376: OCPBUGS-11486: fix: correctly determine master nodes by label for SNO

openshift/lvm-operator#377: OCPBUGS-11486: fix: passthrough unified LE settings to all controllers

openshift/lvm-operator#476: OCPBUGS-11486: fix: immediately drop leader election in case of shutdown to quicken restart

openshift/lvm-operator#478: OCPBUGS-11486: fix: use native leader lock implementation

RHBA-2024:126443 LVMS 4.15 Bug Fix and Enhancement update

mentioned on

Merge request - OCPBUGS-11486: chore: update lvm operator with latest leader election from upstream

(2 links to, 1 mentioned on)

Assignee:: Jakob Moeller (Inactive)

Reporter:: Chen Chen

Need Info From:: None

Contributors:: None

QA Contact:: Rahul Deore

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2023/04/06 1:49 PM

Updated:: 2025/09/13 2:20 PM

Resolved:: 2024/02/08 6:41 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates