Loading...

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 4.19.z, 4.21
Component/s: JobSet
Labels:
None

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.20.z
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:
Clearing `openshift.io/node-selector` annotation to disable defaultNodeSelector, if it is configured in the cluster. Because oc adm restart-kubelet, oc adm copy-to-node commands need to run on any node type.

Escape Reason:
Escape Impact:
Corrective Measures:
SDLC stage when should've been found:

Description of problem:

jobset-controller-manager pod can't be ready with error :

Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       29m                    default-scheduler  Successfully assigned openshift-jobset-operator/jobset-controller-manager-59b8f68c49-mdgnj to ip-10-0-73-53.us-east-2.compute.internal
  Normal   AddedInterface  29m                    multus             Add eth0 [10.128.8.18/23] from ovn-kubernetes
  Normal   Pulling         29m                    kubelet            Pulling image "quay.io/zhouying7780/jobset:js01"
  Normal   Pulled          29m                    kubelet            Successfully pulled image "quay.io/zhouying7780/jobset:js01" in 16.398s (16.398s including waiting). Image size: 76074147 bytes.
  Normal   Created         29m                    kubelet            Created container: manager
  Normal   Started         29m                    kubelet            Started container manager
  Warning  Unhealthy       27m (x12 over 29m)     kubelet            Readiness probe failed: Get "http://10.128.8.18:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  ProbeError      3m58s (x160 over 29m)  kubelet            Readiness probe error: Get "http://10.128.8.18:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Version-Release number of selected component (if applicable):

main branch

How reproducible:

Always

Steps to Reproduce:

Step1. Build the jobset operator with main branch .

Step2. Build the operand image.

Step3. Update the .spec.install.spec.deployments[0].spec.template.spec.containers[0].image field in the JobSet CSV under manifests/jobset-operator.clusterserviceversion.yaml to point to the newly built image
Setp4. Build the bundle and index image;

Step5, Use the index image to create catalogsoure

Step6. On console install cert-manager and jobset operator

Actual results:

The jobset-controller-manager pod not ready

Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       29m                    default-scheduler  Successfully assigned openshift-jobset-operator/jobset-controller-manager-59b8f68c49-mdgnj to ip-10-0-73-53.us-east-2.compute.internal
  Normal   AddedInterface  29m                    multus             Add eth0 [10.128.8.18/23] from ovn-kubernetes
  Normal   Pulling         29m                    kubelet            Pulling image "quay.io/zhouying7780/jobset:js01"
  Normal   Pulled          29m                    kubelet            Successfully pulled image "quay.io/zhouying7780/jobset:js01" in 16.398s (16.398s including waiting). Image size: 76074147 bytes.
  Normal   Created         29m                    kubelet            Created container: manager
  Normal   Started         29m                    kubelet            Started container manager
  Warning  Unhealthy       27m (x12 over 29m)     kubelet            Readiness probe failed: Get "http://10.128.8.18:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  ProbeError      3m58s (x160 over 29m)  kubelet            Readiness probe error: Get "http://10.128.8.18:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Expected results:

Can run with no errors.

Additional information:

see logs from the jobset-controller-manager pod :
2026-02-04T03:31:49Z ERROR controller-runtime.cache.UnhandledError Failed to watch {"reflector": "sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:114", "type": "*v1.Pod", "error": "failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-jobset-operator:jobset-controller-manager\" cannot list resource \"pods\" in API group \"\" at the cluster scope"}
k8s.io/apimachinery/pkg/util/runtime.logError
/workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:221
k8s.io/apimachinery/pkg/util/runtime.handleError
/workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:212
k8s.io/apimachinery/pkg/util/runtime.HandleErrorWithContext
/workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:198
k8s.io/client-go/tools/cache.DefaultWatchErrorHandler
/workspace/vendor/k8s.io/client-go/tools/cache/reflector.go:204
k8s.io/client-go/tools/cache.(*Reflector).RunWithContext.func1
/workspace/vendor/k8s.io/client-go/tools/cache/reflector.go:370
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:233
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext.func1
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:255
k8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:256
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:233
k8s.io/client-go/tools/cache.(*Reflector).RunWithContext
/workspace/vendor/k8s.io/client-go/tools/cache/reflector.go:368
k8s.io/client-go/tools/cache.(*controller).RunWithContext.(*Group).StartWithContext.func3
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:63
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72

Asked AI, after apply additional ClusterRole , the issue fixed :
cat <<EOF | oc apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: jobset-controller-manager-full-access
rules:

apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
apiGroups: ["jobset.x-k8s.io"]
resources: ["jobsets", "jobsets/status", "jobsets/finalizers"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
apiGroups: [""]
resources: ["pods", "services", "events", "configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: jobset-controller-manager-full-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: jobset-controller-manager-full-access
subjects:
kind: ServiceAccount
name: jobset-controller-manager
namespace: openshift-jobset-operator
EOF

clones

OCPBUGS-74978 Jobset operator failed to launch the jobset controller with error : undefined field 'namespace'

Verified

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional information:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates