Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-6134

Allow more control of the leader placement in MetalLB for L2 resiliency

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • openshift-4.14
    • SDN
    • None
    • False
    • None
    • False
    • Not Selected

      1. Proposed title of this feature request
      Allow more control of the leader placement in MetalLB for L2 resiliency
      2. What is the nature and description of the request?

      Having a unicorn leader is causing problems to resiliency so want to look at below:

      • Is it possible to have speaker leader in multiple nodes or all nodes ? (this should make L2 resiliency to swaps)
      • If above pointer is not possible, then when initial leader node goes down and it switches to other node to serve requests that time can we make new node as leader permanently at that time till it goes to bad state again ? (Reason behind that is every leader election or swap is outage in L2 so with this it will be just one swap instead of two)

      Heard that Redhat is working on below fix already but want to draft anyway for tracking:

      • As these are daemonset pods even if node in cordon mode or scheduling disabled mode, pod is trying to come up in that node and as soon as it comes up if that is unicorn leader then it will start taking calls.
      • What if actual hardware issue there and team kept node on cordon mode intentionally and that is unicorn leader ? In that scenario lets assume pod came up because node allowed somehow then it will be 100% outage as node is not fully functional and calls will fail.

      Is there anyway to force change leader node manually ? In cases where we see issues with one node where it picked leader automatically, this one might come handy when we enter into real world issue scenarios.
      3. Why does the customer need this? (List the business requirements here)
      They are concerned over the technical debt of managing bgp across their fleet of 955+ clusters. They would rather leverage the L2 features in MetalLB.
      4. List any affected packages or components.

              mcurry@redhat.com Marc Curry
              rhn-support-antgarci Antonio Garcia
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: