-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.20.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
Yes
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When failed to create the job by jobset , better to create events for the failure.
Version-Release number of selected component (if applicable):
4.20
How reproducible:
Always
Steps to Reproduce:
1) Deploy jobset operator ; 2) create jobset like : apiVersion: jobset.x-k8s.io/v1alpha2 kind: JobSet metadata: name: success-policy spec: # We want to declare our JobSet successful if workers finish. # If workers finish we should clean up the remaining replicatedJobs. successPolicy: operator: All targetReplicatedJobs: - workers replicatedJobs: - name: leader replicas: 1 template: spec: # Set backoff limit to 0 so job will immediately fail if any pod fails. backoffLimit: 0 completions: 1 parallelism: 1 template: spec: containers: - name: leader image: quay.io/openshifttest/hello-openshift:1.2.0 command: - bash - -xc - | sleep 100 - name: workers replicas: 1 template: spec: backoffLimit: 0 parallelism: 5 template: spec: containers: - name: worker image: quay.io/openshifttest/hello-openshift:1.2.0 command: - bash - -xc - | if [[ "$JOB_COMPLETION_INDEX" == "0" ]]; then for i in $(seq 10 -1 1) do echo "Sleeping in $i" sleep 1 done exit $(rand 0 1) fi
Actual results:
2) failed to create the worker job, but no events , check the jobset-controller-manager , could see logs : 2025-09-05T11:27:55Z ERROR Reconciler error {"controller": "jobset", "controllerGroup": "jobset.x-k8s.io", "controllerKind": "JobSet", "JobSet": {"name":"success-policy","namespace":"zhouy"}, "namespace": "zhouy", "name": "success-policy", "reconcileID": "0b09ae51-1cd2-4c31-a908-87d0bee067a8", "error": "job \"success-policy-workers-0\" creation failed with error: Job.batch \"success-policy-workers-0\" is invalid: spec.completions: Required value: when completion mode is Indexed"}sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:353sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1 /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202
Expected results:
Should show events about the failure
Additional info:
- clones
-
OCPBUGS-61301 kube-scheduler operator's some containers don't have ROFS
-
- Closed
-
- is cloned by
-
OCPBUGS-61335 Give good example define for jobset instance
-
- Closed
-
-
OCPBUGS-61400 There are logs overflow when create jobset with RestartJobSetAndIgnoreMaxRestarts
-
- Closed
-