Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61308

Better to create events when failed to deploy the job by jobset

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.20.0
    • JobSet
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Yes
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When failed to create the job by jobset , better to create events for the failure. 

      Version-Release number of selected component (if applicable):

      4.20 

      How reproducible:

          Always

      Steps to Reproduce:

      1) Deploy jobset operator ;
      2) create jobset like : 
      apiVersion: jobset.x-k8s.io/v1alpha2
      kind: JobSet
      metadata:
        name: success-policy
      spec:
      # We want to declare our JobSet successful if workers finish.
      # If workers finish we should clean up the remaining replicatedJobs.
        successPolicy:
          operator: All
          targetReplicatedJobs:
          - workers
        replicatedJobs:
        - name: leader
          replicas: 1
          template:
            spec:
              # Set backoff limit to 0 so job will immediately fail if any pod fails.
              backoffLimit: 0 
              completions: 1
              parallelism: 1
              template:
                spec:
                  containers:
                  - name: leader
                    image: quay.io/openshifttest/hello-openshift:1.2.0
                    command:
                    - bash
                    - -xc
                    - |
                      sleep 100
        - name: workers
          replicas: 1
          template:
            spec:
              backoffLimit: 0 
              parallelism: 5
              template:
                spec:
                  containers:
                  - name: worker
                    image: quay.io/openshifttest/hello-openshift:1.2.0
                    command:
                    - bash
                    - -xc
                    - |
                      if [[ "$JOB_COMPLETION_INDEX" == "0" ]]; then
                        for i in $(seq 10 -1 1)
                        do
                          echo "Sleeping in $i"
                          sleep 1
                        done
                        exit $(rand 0 1)
                      fi     

      Actual results:

       2) failed to create the worker job, but no events , check the jobset-controller-manager , could see logs : 
      
      2025-09-05T11:27:55Z	ERROR	Reconciler error	{"controller": "jobset", "controllerGroup": "jobset.x-k8s.io", "controllerKind": "JobSet", "JobSet": {"name":"success-policy","namespace":"zhouy"}, "namespace": "zhouy", "name": "success-policy", "reconcileID": "0b09ae51-1cd2-4c31-a908-87d0bee067a8", "error": "job \"success-policy-workers-0\" creation failed with error: Job.batch \"success-policy-workers-0\" is invalid: spec.completions: Required value: when completion mode is Indexed"}sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:353sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202

      Expected results:

         Should show events about the failure 

      Additional info:

          

              Unassigned Unassigned
              yinzhou@redhat.com Ying Zhou
              None
              None
              Ying Zhou Ying Zhou
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: