Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.14.z, 4.15.z
Component/s: kube-scheduler
Labels:
None

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, when a pod specified a node selector with no matching nodes, the kube-scheduler panicked with the following error: _Observed a panic: integer divide by zero_. With this release, the issue in the kube-scheduler code base is resolved, and the kube-scheduler no longer panics when a pod specifies a node selector with no matching nodes. (link:https://issues.redhat.com/browse/OCPBUGS-36397[*~~OCPBUGS-36397~~*])

Show
* Previously, when a pod specified a node selector with no matching nodes, the kube-scheduler panicked with the following error: _Observed a panic: integer divide by zero_. With this release, the issue in the kube-scheduler code base is resolved, and the kube-scheduler no longer panics when a pod specifies a node selector with no matching nodes. (link: https://issues.redhat.com/browse/OCPBUGS-36397 [* OCPBUGS-36397 *])
Release Note Type:
Bug Fix
Release Note Status:
In Progress
Target Version:

4.14.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

The kube-scheduler can panic with the error "Observed a panic: "integer divide by zero" (runtime error: integer divide by zero)" when a Pod specifies a Node selector that has no matching nodes.

This issue is only present in OCP 4.14, likely due to this bug being introduced and later fixed upstream [1].

More specifically, I could note reproduce the issue on 4.12, 4.13, or 4.15; it seems to only occur on 4.14, which is likely post-regression and pre-bugfix.

Version-Release number of selected component (if applicable):

OCP 4.14

How reproducible:

Always

Steps to Reproduce:
1. Add Pod with nodeAffinity selector with no matching nodes to cluster:

apiVersion: v1
kind: Pod
metadata:
  name: break-kube-scheduler
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - invalid-node # a node that doesn't exist
  containers:
  - name: main
    image: image-registry.openshift-image-registry.svc:5000/openshift/ubi:latest
    command: ["cat"]
    stdin: true

2. Observe kube-scheduler leader logs

Actual results:
Panic is observed in kube-scheduler leader, causing this Pod and all others after it (alphanumerically?) to remain unscheduled.

Expected results:
Kube-scheduler logs an error indicating no matching node:

E0701 15:35:25.419149 1 schedule_one.go:158] "Error selecting node for pod" err="nodeinfo not found for node name \"invalid-node\"" pod="openshift-kube-scheduler/break-kube-scheduler"
E0701 15:35:25.419196 1 schedule_one.go:891] "Error scheduling pod; retrying" err="nodeinfo not found for node name \"invalid-node\"" pod="openshift-kube-scheduler/break-kube-scheduler"

Additional info:

[1] Upstream bug on kube-scheduler panic (Kubernetes #124930)

[2] Similar issue in OCP 4.17 (~~OCPBUGS-34593~~)

is incorporated by

OCPBUGS-35551 Bump to kubernetes 1.29.6

Closed

OCPBUGS-35552 Bump to kubernetes 1.28.11

Closed

OCPBUGS-35553 Bump to kubernetes 1.27.15

Closed

is related to

OCPBUGS-34593 panic: "integer divide by zero" (runtime error: integer divide by zero), kube-scheduler

Closed

links to

RHBA-2024:4960 OpenShift Container Platform 4.14.z bug fix update

Solution for OCP 4.14 kube-scheduler crash

(1 links to)

Assignee:: Jan Chaloupka

Reporter:: Jordan Bell

QA Contact:: Rama Kasturi Narra

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2024/07/01 4:01 PM

Updated:: 2024/10/09 5:31 PM

Resolved:: 2024/08/07 10:52 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates