-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.14.z, 4.15.z
-
None
Description of problem:
The kube-scheduler can panic with the error "Observed a panic: "integer divide by zero" (runtime error: integer divide by zero)" when a Pod specifies a Node selector that has no matching nodes. This issue is only present in OCP 4.14, likely due to this bug being introduced and later fixed upstream [1]. More specifically, I could note reproduce the issue on 4.12, 4.13, or 4.15; it seems to only occur on 4.14, which is likely post-regression and pre-bugfix.
Version-Release number of selected component (if applicable):
OCP 4.14
How reproducible:
Always
Steps to Reproduce:
1. Add Pod with nodeAffinity selector with no matching nodes to cluster:
apiVersion: v1 kind: Pod metadata: name: break-kube-scheduler spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchFields: - key: metadata.name operator: In values: - invalid-node # a node that doesn't exist containers: - name: main image: image-registry.openshift-image-registry.svc:5000/openshift/ubi:latest command: ["cat"] stdin: true
2. Observe kube-scheduler leader logs
Actual results:
Panic is observed in kube-scheduler leader, causing this Pod and all others after it (alphanumerically?) to remain unscheduled.
Expected results:
Kube-scheduler logs an error indicating no matching node:
E0701 15:35:25.419149 1 schedule_one.go:158] "Error selecting node for pod" err="nodeinfo not found for node name \"invalid-node\"" pod="openshift-kube-scheduler/break-kube-scheduler" E0701 15:35:25.419196 1 schedule_one.go:891] "Error scheduling pod; retrying" err="nodeinfo not found for node name \"invalid-node\"" pod="openshift-kube-scheduler/break-kube-scheduler"
Additional info:
[1] Upstream bug on kube-scheduler panic (Kubernetes #124930)
[2] Similar issue in OCP 4.17 (OCPBUGS-34593)
- is incorporated by
-
OCPBUGS-35551 Bump to kubernetes 1.29.6
- Closed
-
OCPBUGS-35552 Bump to kubernetes 1.28.11
- Closed
-
OCPBUGS-35553 Bump to kubernetes 1.27.15
- Closed
- is related to
-
OCPBUGS-34593 panic: "integer divide by zero" (runtime error: integer divide by zero), kube-scheduler
- Closed
- links to
-
RHBA-2024:4960 OpenShift Container Platform 4.14.z bug fix update