Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74210

[LWS][JobSet] - validation is deferred to post-resource creation

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • OCP Node Kueue Sprint 283, OCP Node Kueue Sprint 284, OCP Kueue Sprint 285
    • 3
    • In Progress
    • Enhancement
    • Kueue Operator will emit a condition if JobSet or LeaderWorkerSet are specified as an integration but their APIs are not found. The Kueue operator will emit a condition saying a dependency is not installed.
    • None
    • None
    • None
    • None

      LWS validation currently happens late in the lifecycle. The system only verifies the LWS installation after the user has configured the Kueue hierarchy (ResourceFlavor, ClusterQueue, Namespace, LocalQueue, etc) and attempts to apply a template. This results in late-stage failure feedback for the user.

       

      Steps to Reproduce:

      • Install Kueue Operator
      • On Kueue Operand CR add LeaderWorkerSet 
        • Optional: Check that configMap "kueue-manager-config" is updated with LWS info
      • Create a Resource Flavor and ClusterQueue
      • Create a Namespace and add a LocalQueue
      • Apply a LWS template
      • The following error should be shown:
      error: resource mapping not found for name: "lws-and-workers" namespace: "test-lws" from "leaderworkerset.yaml": no matches for kind "LeaderWorkerSet" in version "leaderworkerset.x-k8s.io/v1"
      ensure CRDs are installed first

       

      Current Behavior: Validation is deferred. Currently, LWS is only validated after the Operand CR is updated and the full Kueue hierarchy (ResourceFlavor, ClusterQueue, and LocalQueue) is established. Users only discover configuration errors at the final step when applying an LWS template.

       

      Proposed Change: Shift validation earlier in the process for better user experience. Validation can occur immediately when LWS is added to the Operand CR. This ensures the environment is "good" before the user invests time in configuring the queuing resources.

       

        1. jobset_after.png
          jobset_after.png
          36 kB
        2. jobset_before.png
          jobset_before.png
          39 kB
        3. lws_after.png
          lws_after.png
          39 kB
        4. lws_before.png
          lws_before.png
          42 kB
        5. lws-jobset_after.png
          lws-jobset_after.png
          37 kB
        6. lws-jobset_before.png
          lws-jobset_before.png
          40 kB

              mdemaced Maysa De Macedo Souza
              rh-ee-anahas Alice Nahas
              Cameron Meadors Cameron Meadors
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: