Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-7805

Allow to configure lease parameters for the kueue manager controller

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • None
    • AI/ML Workloads, Node
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Konflux would need to tweak the leader election parameters for the Kueue controller. In short, I need to increase the leaseDuration and renewDeadline since on the Konflux clusters fragmentation takes about 30 seconds and with the current configurations the leader crashes because it fails to renew the lease at the time given to it.
      Upstream Kueue already support configuring those

      update: There are some defaults OCP recommends for operator authors to set for the lease configuration.

      taken from https://github.com/openshift/enhancements/blob/0f916a52af1a6fbdab0c5b80ae0e66c7a27efb6a/CONVENTIONS.md#handling-kube-apiserver-disruption 

      the recommendation are: LeaseDuration=137s, RenewDealine=107s, RetryPeriod=26s.

      Those values will be sufficient for Konflux, so in case those will be configured, customizing the lease parameters by the user won't be needed.

      There was also a suggestion to configure `ReleaseOnCancel` if the Kueue controller supports it.

      Slack thread for context - https://redhat-internal.slack.com/archives/C084N2C6P9U/p1751457510022409

              rhn-support-dhardie Duncan Hardie
              gbenhaim Gal Ben Haim
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved:
                None
                None