-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.20.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
Yes
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When failed to create the job by jobset , better to create events for the failure.
Version-Release number of selected component (if applicable):
4.20
How reproducible:
Always
Steps to Reproduce:
1) Deploy jobset operator ;
2) create jobset like :
apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
name: success-policy
spec:
# We want to declare our JobSet successful if workers finish.
# If workers finish we should clean up the remaining replicatedJobs.
successPolicy:
operator: All
targetReplicatedJobs:
- workers
replicatedJobs:
- name: leader
replicas: 1
template:
spec:
# Set backoff limit to 0 so job will immediately fail if any pod fails.
backoffLimit: 0
completions: 1
parallelism: 1
template:
spec:
containers:
- name: leader
image: quay.io/openshifttest/hello-openshift:1.2.0
command:
- bash
- -xc
- |
sleep 100
- name: workers
replicas: 1
template:
spec:
backoffLimit: 0
parallelism: 5
template:
spec:
containers:
- name: worker
image: quay.io/openshifttest/hello-openshift:1.2.0
command:
- bash
- -xc
- |
if [[ "$JOB_COMPLETION_INDEX" == "0" ]]; then
for i in $(seq 10 -1 1)
do
echo "Sleeping in $i"
sleep 1
done
exit $(rand 0 1)
fi
Actual results:
2) failed to create the worker job, but no events , check the jobset-controller-manager , could see logs :
2025-09-05T11:27:55Z ERROR Reconciler error {"controller": "jobset", "controllerGroup": "jobset.x-k8s.io", "controllerKind": "JobSet", "JobSet": {"name":"success-policy","namespace":"zhouy"}, "namespace": "zhouy", "name": "success-policy", "reconcileID": "0b09ae51-1cd2-4c31-a908-87d0bee067a8", "error": "job \"success-policy-workers-0\" creation failed with error: Job.batch \"success-policy-workers-0\" is invalid: spec.completions: Required value: when completion mode is Indexed"}sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:353sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:300sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1 /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:202
Expected results:
Should show events about the failure
Additional info:
- clones
-
OCPBUGS-61301 kube-scheduler operator's some containers don't have ROFS
-
- Closed
-
- is cloned by
-
OCPBUGS-61335 Give good example define for jobset instance
-
- Closed
-
-
OCPBUGS-61400 There are logs overflow when create jobset with RestartJobSetAndIgnoreMaxRestarts
-
- Closed
-
- links to