-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.14, 4.14.z
-
None
-
No
-
Sprint 238, Sprint 239, Sprint 240, Sprint 241
-
4
-
Rejected
-
False
-
Description of problem:
After custom toleration (tainting the dns pod) on master node the dns pod stuck in pending state
Version-Release number of selected component (if applicable):
How reproducible:
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-41050
Steps to Reproduce:
1.melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-05-03-163151 True False 4h5m Cluster version is 4.14.0-0.nightly-2023-05-03-163151 2.check default dns pods placement melvinjoseph@mjoseph-mac Downloads % ouf5M-5AVBm-Taoxt-aIgPmoc -n openshift-dns get pod -owide melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dns-default-6cv9k 2/2 Running 0 4h12m 10.131.0.8 shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal <none> <none> dns-default-8g2w8 2/2 Running 0 4h12m 10.129.2.5 shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal <none> <none> dns-default-df7zj 2/2 Running 0 4h18m 10.128.0.40 shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal <none> <none> dns-default-kmv4c 2/2 Running 0 4h18m 10.130.0.9 shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal <none> <none> dns-default-lxxkt 2/2 Running 0 4h18m 10.129.0.11 shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal <none> <none> dns-default-mjrnx 2/2 Running 0 4h11m 10.128.2.4 shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal <none> <none> node-resolver-5bnjv 1/1 Running 0 4h12m 10.0.128.3 shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal <none> <none> node-resolver-7ns8b 1/1 Running 0 4h18m 10.0.0.4 shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal <none> <none> node-resolver-bz7k5 1/1 Running 0 4h12m 10.0.128.2 shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal <none> <none> node-resolver-c67mw 1/1 Running 0 4h18m 10.0.0.3 shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal <none> <none> node-resolver-d8h65 1/1 Running 0 4h12m 10.0.128.4 shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal <none> <none> node-resolver-rgb92 1/1 Running 0 4h18m 10.0.0.5 shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal <none> <none> 3.oc -n openshift-dns get ds/dns-default -oyaml tolerations: - key: node-role.kubernetes.io/master operator: Exists melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml apiVersion: operator.openshift.io/v1 kind: DNS metadata: creationTimestamp: "2023-05-08T00:39:00Z" finalizers: - dns.operator.openshift.io/dns-controller generation: 1 name: default resourceVersion: "22893" uid: ae53e756-42a3-4c9d-8284-524df006382d spec: cache: negativeTTL: 0s positiveTTL: 0s logLevel: Normal nodePlacement: {} operatorLogLevel: Normal upstreamResolvers: policy: Sequential transportConfig: {} upstreams: - port: 53 type: SystemResolvConf status: clusterDomain: cluster.local clusterIP: 172.30.0.10 conditions: - lastTransitionTime: "2023-05-08T00:46:20Z" message: Enough DNS pods are available, and the DNS service has a cluster IP address. reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2023-05-08T00:46:20Z" message: All DNS and node-resolver pods are available, and the DNS service has a cluster IP address. reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2023-05-08T00:39:25Z" message: The DNS daemonset has available pods, and the DNS service has a cluster IP address. reason: AsExpected status: "True" type: Available - lastTransitionTime: "2023-05-08T00:39:01Z" message: DNS Operator can be upgraded reason: AsExpected status: "True" type: Upgradeable 4. config custom tolerations of dns pod (to not tolerate master node taints) $ oc edit dns.operator default spec: nodePlacement: tolerations: - effect: NoExecute key: my-dns-test operators: Equal value: abc tolerationSeconds: 3600 melvinjoseph@mjoseph-mac Downloads % oc edit dns.operator default Warning: unknown field "spec.nodePlacement.tolerations[0].operators" dns.operator.openshift.io/default edited melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dns-default-6cv9k 2/2 Running 0 5h16m 10.131.0.8 shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal <none> <none> dns-default-8g2w8 2/2 Running 0 5h16m 10.129.2.5 shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal <none> <none> dns-default-df7zj 2/2 Running 0 5h22m 10.128.0.40 shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal <none> <none> dns-default-kmv4c 2/2 Running 0 5h22m 10.130.0.9 shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal <none> <none> dns-default-lxxkt 2/2 Running 0 5h22m 10.129.0.11 shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal <none> <none> dns-default-mjrnx 2/2 Running 0 5h16m 10.128.2.4 shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal <none> <none> dns-default-xqxr9 0/2 Pending 0 7s <none> <none> <none> <none> node-resolver-5bnjv 1/1 Running 0 5h17m 10.0.128.3 shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal <none> <none> node-resolver-7ns8b 1/1 Running 0 5h22m 10.0.0.4 shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal <none> <none> node-resolver-bz7k5 1/1 Running 0 5h16m 10.0.128.2 shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal <none> <none> node-resolver-c67mw 1/1 Running 0 5h22m 10.0.0.3 shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal <none> <none> node-resolver-d8h65 1/1 Running 0 5h16m 10.0.128.4 shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal <none> <none> node-resolver-rgb92 1/1 Running 0 5h22m 10.0.0.5 shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal <none> <none> The dns pod stuck in pending state melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get ds/dns-default -oyaml <-----snip---> tolerations: - effect: NoExecute key: my-dns-test tolerationSeconds: 3600 value: abc volumes: - configMap: defaultMode: 420 items: - key: Corefile path: Corefile name: dns-default name: config-volume - name: metrics-tls secret: defaultMode: 420 secretName: dns-default-metrics-tls updateStrategy: rollingUpdate: maxSurge: 10% maxUnavailable: 0 type: RollingUpdate status: currentNumberScheduled: 3 desiredNumberScheduled: 3 numberAvailable: 3 numberMisscheduled: 3 numberReady: 3 observedGeneration: 2 melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml apiVersion: operator.openshift.io/v1 kind: DNS metadata: creationTimestamp: "2023-05-08T00:39:00Z" finalizers: - dns.operator.openshift.io/dns-controller generation: 2 name: default resourceVersion: "125435" uid: ae53e756-42a3-4c9d-8284-524df006382d spec: cache: negativeTTL: 0s positiveTTL: 0s logLevel: Normal nodePlacement: tolerations: - effect: NoExecute key: my-dns-test tolerationSeconds: 3600 value: abc operatorLogLevel: Normal upstreamResolvers: policy: Sequential transportConfig: {} upstreams: - port: 53 type: SystemResolvConf status: clusterDomain: cluster.local clusterIP: 172.30.0.10 conditions: - lastTransitionTime: "2023-05-08T00:46:20Z" message: Enough DNS pods are available, and the DNS service has a cluster IP address. reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2023-05-08T06:01:52Z" message: Have 0 up-to-date DNS pods, want 3. reason: Reconciling status: "True" type: Progressing - lastTransitionTime: "2023-05-08T00:39:25Z" message: The DNS daemonset has available pods, and the DNS service has a cluster IP address. reason: AsExpected status: "True" type: Available - lastTransitionTime: "2023-05-08T00:39:01Z" message: DNS Operator can be upgraded reason: AsExpected status: "True" type: Upgradeable melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod NAME READY STATUS RESTARTS AGE dns-default-6cv9k 2/2 Running 0 5h18m dns-default-8g2w8 2/2 Running 0 5h18m dns-default-df7zj 2/2 Running 0 5h25m dns-default-kmv4c 2/2 Running 0 5h25m dns-default-lxxkt 2/2 Running 0 5h25m dns-default-mjrnx 2/2 Running 0 5h18m dns-default-xqxr9 0/2 Pending 0 2m12s node-resolver-5bnjv 1/1 Running 0 5h19m node-resolver-7ns8b 1/1 Running 0 5h25m node-resolver-bz7k5 1/1 Running 0 5h19m node-resolver-c67mw 1/1 Running 0 5h25m node-resolver-d8h65 1/1 Running 0 5h19m node-resolver-rgb92 1/1 Running 0 5h25m
Actual results:
The dns pod dns-default-xqxr9 stuck in pending state
Expected results:
There will be reloaded DNS pods
Additional info:
melvinjoseph@mjoseph-mac Downloads % oc describe po/dns-default-xqxr9 -n openshift-dns Name: dns-default-xqxr9 Namespace: openshift-dns Priority: 2000001000 <----snip---> Node-Selectors: kubernetes.io/os=linux Tolerations: my-dns-test=abc:NoExecute for 3600s node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 3m45s default-scheduler 0/6 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 Preemption is not helpful for scheduling, 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) didn't match Pod's node affinity/selector..
- is blocked by
-
OCPBUGS-15531 [4.14] DaemonSet fails to scale down during the rolling update when maxUnavailable=0
- Closed
- is related to
-
OCPBUGS-5943 Excessive TopologyAwareHintsDisabled events due to service/dns-default with topology aware hints activated.
- Closed
- links to
-
RHEA-2023:5006 rpm