-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.19.z, 4.21
-
None
-
None
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
In Progress
-
Bug Fix
-
Clearing `openshift.io/node-selector` annotation to disable defaultNodeSelector, if it is configured in the cluster. Because oc adm restart-kubelet, oc adm copy-to-node commands need to run on any node type.
Description of problem:
jobset-controller-manager pod can't be ready with error :
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 29m default-scheduler Successfully assigned openshift-jobset-operator/jobset-controller-manager-59b8f68c49-mdgnj to ip-10-0-73-53.us-east-2.compute.internal Normal AddedInterface 29m multus Add eth0 [10.128.8.18/23] from ovn-kubernetes Normal Pulling 29m kubelet Pulling image "quay.io/zhouying7780/jobset:js01" Normal Pulled 29m kubelet Successfully pulled image "quay.io/zhouying7780/jobset:js01" in 16.398s (16.398s including waiting). Image size: 76074147 bytes. Normal Created 29m kubelet Created container: manager Normal Started 29m kubelet Started container manager Warning Unhealthy 27m (x12 over 29m) kubelet Readiness probe failed: Get "http://10.128.8.18:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning ProbeError 3m58s (x160 over 29m) kubelet Readiness probe error: Get "http://10.128.8.18:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Version-Release number of selected component (if applicable):
- main branch
How reproducible:
Always
Steps to Reproduce:
Step1. Build the jobset operator with main branch .
Step2. Build the operand image.
Step3. Update the .spec.install.spec.deployments[0].spec.template.spec.containers[0].image field in the JobSet CSV under manifests/jobset-operator.clusterserviceversion.yaml to point to the newly built image
Setp4. Build the bundle and index image;
Step5, Use the index image to create catalogsoure
Step6. On console install cert-manager and jobset operator
Actual results:
The jobset-controller-manager pod not ready
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 29m default-scheduler Successfully assigned openshift-jobset-operator/jobset-controller-manager-59b8f68c49-mdgnj to ip-10-0-73-53.us-east-2.compute.internal Normal AddedInterface 29m multus Add eth0 [10.128.8.18/23] from ovn-kubernetes Normal Pulling 29m kubelet Pulling image "quay.io/zhouying7780/jobset:js01" Normal Pulled 29m kubelet Successfully pulled image "quay.io/zhouying7780/jobset:js01" in 16.398s (16.398s including waiting). Image size: 76074147 bytes. Normal Created 29m kubelet Created container: manager Normal Started 29m kubelet Started container manager Warning Unhealthy 27m (x12 over 29m) kubelet Readiness probe failed: Get "http://10.128.8.18:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning ProbeError 3m58s (x160 over 29m) kubelet Readiness probe error: Get "http://10.128.8.18:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Expected results:
Can run with no errors.
Additional information:
see logs from the jobset-controller-manager pod :
2026-02-04T03:31:49Z ERROR controller-runtime.cache.UnhandledError Failed to watch {"reflector": "sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:114", "type": "*v1.Pod", "error": "failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-jobset-operator:jobset-controller-manager\" cannot list resource \"pods\" in API group \"\" at the cluster scope"}
k8s.io/apimachinery/pkg/util/runtime.logError
/workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:221
k8s.io/apimachinery/pkg/util/runtime.handleError
/workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:212
k8s.io/apimachinery/pkg/util/runtime.HandleErrorWithContext
/workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:198
k8s.io/client-go/tools/cache.DefaultWatchErrorHandler
/workspace/vendor/k8s.io/client-go/tools/cache/reflector.go:204
k8s.io/client-go/tools/cache.(*Reflector).RunWithContext.func1
/workspace/vendor/k8s.io/client-go/tools/cache/reflector.go:370
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:233
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext.func1
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:255
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:256
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:233
k8s.io/client-go/tools/cache.(*Reflector).RunWithContext
/workspace/vendor/k8s.io/client-go/tools/cache/reflector.go:368
k8s.io/client-go/tools/cache.(*controller).RunWithContext.(*Group).StartWithContext.func3
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:63
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72
Asked AI, after apply additional ClusterRole , the issue fixed :
cat <<EOF | oc apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: jobset-controller-manager-full-access
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: ["jobset.x-k8s.io"]
resources: ["jobsets", "jobsets/status", "jobsets/finalizers"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: [""]
resources: ["pods", "services", "events", "configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: jobset-controller-manager-full-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: jobset-controller-manager-full-access
subjects: - kind: ServiceAccount
name: jobset-controller-manager
namespace: openshift-jobset-operator
EOF
- clones
-
OCPBUGS-74978 Jobset operator failed to launch the jobset controller with error : undefined field 'namespace'
-
- New
-