Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

From -> https://issues.redhat.com/browse/RHODS-1619

As we have implemented High Availability on Jupyterhub, we shifted from 1 to 3 containers with the Leader Election strategy.

With this implementation, we could bump into inconsistencies given that one Pod could be thinking he's still the leader elected, but others could have replaced it due to network problems.

Implement a check to detect if a pod running is still the leader elected, and if not, delete it.

In the following diagram we can detail the error:

If somehow the pods bump into network problems, it might trigger a new election while the old container still thinks it's the leader, once the network issues are fixed, there will be two leaders, as there is no mechanism to probe the leader election an restart the pod.

[SPIKE] We have already thought about using a liveness probe script -> https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Working branch -> https://github.com/lucferbux/odh-manifests/tree/buxfix-jupyterhub-leader-election

The main issue with this exploration is that we could isolate the pod with the readinessProbe (not passing traffic ) and restart the container with the livenessProbe, but all of this only works on the container level, not the pod level. This won't acknowledge the problem in the sidecar container of the image.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

leader election issue-1.png
235 kB
2021/08/26 2:23 PM

Assignee:: Lucas Fernandez Aragon

Reporter:: Lucas Fernandez Aragon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2021/08/26 2:22 PM

Updated:: 2021/09/20 9:51 AM

Resolved:: 2021/09/20 9:51 AM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates