-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.20.z
-
None
Description of problem:
We noticed that some of nmstate pods are being stuck at Pending state.
$ oc get pods -n openshift-nmstate -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nmstate-console-plugin-8b88fbd7-qz7hd 1/1 Running 0 4m20s 10.128.56.60 worker0 <none> <none> nmstate-handler-5crh9 0/1 Pending 0 4m22s <none> <none> <none> <none> nmstate-handler-6zndl 0/1 Pending 0 4m22s <none> <none> <none> <none> nmstate-handler-brq44 1/1 Running 0 4m21s 172.19.90.140 worker2 <none> <none> nmstate-handler-hl9ls 0/1 Pending 0 4m22s <none> <none> <none> <none> nmstate-handler-lgxq5 0/1 Pending 0 4m22s <none> <none> <none> <none> nmstate-handler-mwr55 0/1 Pending 0 4m22s <none> <none> <none> <none> nmstate-handler-rdkhs 1/1 Running 0 4m22s 172.19.90.139 worker1 <none> <none> nmstate-handler-sv5zm 1/1 Running 0 4m21s 172.19.90.138 worker0 <none> <none> nmstate-handler-v85qg 0/1 Pending 0 4m22s <none> <none> <none> <none> nmstate-metrics-75c64559db-xm9kx 2/2 Running 0 4m22s 10.128.64.58 worker1 <none> <none> nmstate-operator-7844c5895f-cc6jx 1/1 Running 0 13m 10.128.0.132 master0 <none> <none> nmstate-webhook-6dccbdf6bd-54ggc 1/1 Running 0 4m22s 10.128.64.59 worker1 <none> <none> nmstate-webhook-6dccbdf6bd-jc2nm 1/1 Running 0 4m22s 10.128.56.59 worker0 <none> <none>
This is a similar issue as OCPBUGS-58038, CNV-71397, OCPBUGS-9767, OCPBUGS-22305, etc.
The root cause is that the namespace doesn't have `openshift.io/node-selector: ""` annotation.
What is troublesome for customers is that the nmstate operator deletes this annotation if they added it...
Version-Release number of selected component (if applicable):
- kubernetes-nmstate-operator.4.20.0-202511181524
How reproducible:
Always
Steps to Reproduce:
Step1. Configure defaultNodeSelector as that normal pods are placed only on worker nodes.
$ oc edit scheduler apiVersion: config.openshift.io/v1 kind: Scheduler metadata: ... spec: defaultNodeSelector: node-role.kubernetes.io/worker= ...
Step2. Install nmstate operator. Please don't forget add `openshift.io/node-selector: ""` in the namespace yaml.
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/node-selector: ""
labels:
kubernetes.io/metadata.name: openshift-nmstate
name: openshift-nmstate
name: openshift-nmstate
spec:
finalizers:
- kubernetes
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
annotations:
olm.providedAPIs: NMState.v1.nmstate.io
name: openshift-nmstate
namespace: openshift-nmstate
spec:
targetNamespaces:
- openshift-nmstate
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
labels:
operators.coreos.com/kubernetes-nmstate-operator.openshift-nmstate: ""
name: kubernetes-nmstate-operator
namespace: openshift-nmstate
spec:
channel: stable
installPlanApproval: Automatic
name: kubernetes-nmstate-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
Step3. Define NMstate object after the operator installation completed.
apiVersion: nmstate.io/v1 kind: NMState metadata: name: nmstate EOF
Step4. Run `oc get pods -n openshift-nmstate -o wide`.
Actual results:
Some of pods are being stuck as Pending state.
You notice that `openshift.io/node-selector: ""` was eliminated by the operator...
$ oc get namespace -o yaml openshift-nmstate | grep nodeSelector || echo "Not Found" Not Found
Expected results:
All pods are running.
Additional information:
The only workaround is that to add `openshift.io/node-selector: ""` manually then delete all pods.
However, the operator deletes the annotation after a few minites. So, pods will be Pending again if they were recreated by some reason, e.g, openshift ugprade.
This is a big defect for users who use defaultNodeSelector.
All of our customers use defaultNodeSelector for avoiding to schedule their pods on other than worker nodes..