-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.14.z, 4.15.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
In Progress
-
Bug Fix
-
-
None
-
None
-
None
-
None
Description of problem:
The kube-scheduler can panic with the error "Observed a panic: "integer divide by zero" (runtime error: integer divide by zero)" when a Pod specifies a Node selector that has no matching nodes. This issue is only present in OCP 4.14, likely due to this bug being introduced and later fixed upstream [1]. More specifically, I could note reproduce the issue on 4.12, 4.13, or 4.15; it seems to only occur on 4.14, which is likely post-regression and pre-bugfix.
Version-Release number of selected component (if applicable):
OCP 4.14
How reproducible:
Always
Steps to Reproduce:
1. Add Pod with nodeAffinity selector with no matching nodes to cluster:
apiVersion: v1
kind: Pod
metadata:
name: break-kube-scheduler
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- invalid-node # a node that doesn't exist
containers:
- name: main
image: image-registry.openshift-image-registry.svc:5000/openshift/ubi:latest
command: ["cat"]
stdin: true
2. Observe kube-scheduler leader logs
Actual results:
Panic is observed in kube-scheduler leader, causing this Pod and all others after it (alphanumerically?) to remain unscheduled.
Expected results:
Kube-scheduler logs an error indicating no matching node:
E0701 15:35:25.419149 1 schedule_one.go:158] "Error selecting node for pod" err="nodeinfo not found for node name \"invalid-node\"" pod="openshift-kube-scheduler/break-kube-scheduler" E0701 15:35:25.419196 1 schedule_one.go:891] "Error scheduling pod; retrying" err="nodeinfo not found for node name \"invalid-node\"" pod="openshift-kube-scheduler/break-kube-scheduler"
Additional info:
[1] Upstream bug on kube-scheduler panic (Kubernetes #124930)
[2] Similar issue in OCP 4.17 (OCPBUGS-34593)
- is incorporated by
-
OCPBUGS-35551 Bump to kubernetes 1.29.6
-
- Closed
-
-
OCPBUGS-35552 Bump to kubernetes 1.28.11
-
- Closed
-
-
OCPBUGS-35553 Bump to kubernetes 1.27.15
-
- Closed
-
- is related to
-
OCPBUGS-34593 panic: "integer divide by zero" (runtime error: integer divide by zero), kube-scheduler
-
- Closed
-
- links to
-
RHBA-2024:4960
OpenShift Container Platform 4.14.z bug fix update