-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.14, 4.14.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
Rejected
-
Sprint 238, Sprint 239, Sprint 240, Sprint 241
-
4
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
After custom toleration (tainting the dns pod) on master node the dns pod stuck in pending state
Version-Release number of selected component (if applicable):
How reproducible:
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-41050
Steps to Reproduce:
1.melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.14.0-0.nightly-2023-05-03-163151 True False 4h5m Cluster version is 4.14.0-0.nightly-2023-05-03-163151
2.check default dns pods placement
melvinjoseph@mjoseph-mac Downloads % ouf5M-5AVBm-Taoxt-aIgPmoc -n openshift-dns get pod -owide
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dns-default-6cv9k 2/2 Running 0 4h12m 10.131.0.8 shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal <none> <none>
dns-default-8g2w8 2/2 Running 0 4h12m 10.129.2.5 shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal <none> <none>
dns-default-df7zj 2/2 Running 0 4h18m 10.128.0.40 shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal <none> <none>
dns-default-kmv4c 2/2 Running 0 4h18m 10.130.0.9 shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal <none> <none>
dns-default-lxxkt 2/2 Running 0 4h18m 10.129.0.11 shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal <none> <none>
dns-default-mjrnx 2/2 Running 0 4h11m 10.128.2.4 shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal <none> <none>
node-resolver-5bnjv 1/1 Running 0 4h12m 10.0.128.3 shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal <none> <none>
node-resolver-7ns8b 1/1 Running 0 4h18m 10.0.0.4 shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal <none> <none>
node-resolver-bz7k5 1/1 Running 0 4h12m 10.0.128.2 shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal <none> <none>
node-resolver-c67mw 1/1 Running 0 4h18m 10.0.0.3 shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal <none> <none>
node-resolver-d8h65 1/1 Running 0 4h12m 10.0.128.4 shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal <none> <none>
node-resolver-rgb92 1/1 Running 0 4h18m 10.0.0.5 shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal <none> <none>
3.oc -n openshift-dns get ds/dns-default -oyaml
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
creationTimestamp: "2023-05-08T00:39:00Z"
finalizers:
- dns.operator.openshift.io/dns-controller
generation: 1
name: default
resourceVersion: "22893"
uid: ae53e756-42a3-4c9d-8284-524df006382d
spec:
cache:
negativeTTL: 0s
positiveTTL: 0s
logLevel: Normal
nodePlacement: {}
operatorLogLevel: Normal
upstreamResolvers:
policy: Sequential
transportConfig: {}
upstreams:
- port: 53
type: SystemResolvConf
status:
clusterDomain: cluster.local
clusterIP: 172.30.0.10
conditions:
- lastTransitionTime: "2023-05-08T00:46:20Z"
message: Enough DNS pods are available, and the DNS service has a cluster IP address.
reason: AsExpected
status: "False"
type: Degraded
- lastTransitionTime: "2023-05-08T00:46:20Z"
message: All DNS and node-resolver pods are available, and the DNS service has
a cluster IP address.
reason: AsExpected
status: "False"
type: Progressing
- lastTransitionTime: "2023-05-08T00:39:25Z"
message: The DNS daemonset has available pods, and the DNS service has a cluster
IP address.
reason: AsExpected
status: "True"
type: Available
- lastTransitionTime: "2023-05-08T00:39:01Z"
message: DNS Operator can be upgraded
reason: AsExpected
status: "True"
type: Upgradeable
4. config custom tolerations of dns pod (to not tolerate master node taints)
$ oc edit dns.operator default
spec:
nodePlacement:
tolerations:
- effect: NoExecute
key: my-dns-test
operators: Equal
value: abc
tolerationSeconds: 3600
melvinjoseph@mjoseph-mac Downloads % oc edit dns.operator default
Warning: unknown field "spec.nodePlacement.tolerations[0].operators"
dns.operator.openshift.io/default edited
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dns-default-6cv9k 2/2 Running 0 5h16m 10.131.0.8 shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal <none> <none>
dns-default-8g2w8 2/2 Running 0 5h16m 10.129.2.5 shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal <none> <none>
dns-default-df7zj 2/2 Running 0 5h22m 10.128.0.40 shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal <none> <none>
dns-default-kmv4c 2/2 Running 0 5h22m 10.130.0.9 shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal <none> <none>
dns-default-lxxkt 2/2 Running 0 5h22m 10.129.0.11 shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal <none> <none>
dns-default-mjrnx 2/2 Running 0 5h16m 10.128.2.4 shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal <none> <none>
dns-default-xqxr9 0/2 Pending 0 7s <none> <none> <none> <none>
node-resolver-5bnjv 1/1 Running 0 5h17m 10.0.128.3 shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal <none> <none>
node-resolver-7ns8b 1/1 Running 0 5h22m 10.0.0.4 shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal <none> <none>
node-resolver-bz7k5 1/1 Running 0 5h16m 10.0.128.2 shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal <none> <none>
node-resolver-c67mw 1/1 Running 0 5h22m 10.0.0.3 shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal <none> <none>
node-resolver-d8h65 1/1 Running 0 5h16m 10.0.128.4 shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal <none> <none>
node-resolver-rgb92 1/1 Running 0 5h22m 10.0.0.5 shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal <none> <none>
The dns pod stuck in pending state
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get ds/dns-default -oyaml
<-----snip--->
tolerations:
- effect: NoExecute
key: my-dns-test
tolerationSeconds: 3600
value: abc
volumes:
- configMap:
defaultMode: 420
items:
- key: Corefile
path: Corefile
name: dns-default
name: config-volume
- name: metrics-tls
secret:
defaultMode: 420
secretName: dns-default-metrics-tls
updateStrategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberAvailable: 3
numberMisscheduled: 3
numberReady: 3
observedGeneration: 2
melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
creationTimestamp: "2023-05-08T00:39:00Z"
finalizers:
- dns.operator.openshift.io/dns-controller
generation: 2
name: default
resourceVersion: "125435"
uid: ae53e756-42a3-4c9d-8284-524df006382d
spec:
cache:
negativeTTL: 0s
positiveTTL: 0s
logLevel: Normal
nodePlacement:
tolerations:
- effect: NoExecute
key: my-dns-test
tolerationSeconds: 3600
value: abc
operatorLogLevel: Normal
upstreamResolvers:
policy: Sequential
transportConfig: {}
upstreams:
- port: 53
type: SystemResolvConf
status:
clusterDomain: cluster.local
clusterIP: 172.30.0.10
conditions:
- lastTransitionTime: "2023-05-08T00:46:20Z"
message: Enough DNS pods are available, and the DNS service has a cluster IP address.
reason: AsExpected
status: "False"
type: Degraded
- lastTransitionTime: "2023-05-08T06:01:52Z"
message: Have 0 up-to-date DNS pods, want 3.
reason: Reconciling
status: "True"
type: Progressing
- lastTransitionTime: "2023-05-08T00:39:25Z"
message: The DNS daemonset has available pods, and the DNS service has a cluster
IP address.
reason: AsExpected
status: "True"
type: Available
- lastTransitionTime: "2023-05-08T00:39:01Z"
message: DNS Operator can be upgraded
reason: AsExpected
status: "True"
type: Upgradeable
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod
NAME READY STATUS RESTARTS AGE
dns-default-6cv9k 2/2 Running 0 5h18m
dns-default-8g2w8 2/2 Running 0 5h18m
dns-default-df7zj 2/2 Running 0 5h25m
dns-default-kmv4c 2/2 Running 0 5h25m
dns-default-lxxkt 2/2 Running 0 5h25m
dns-default-mjrnx 2/2 Running 0 5h18m
dns-default-xqxr9 0/2 Pending 0 2m12s
node-resolver-5bnjv 1/1 Running 0 5h19m
node-resolver-7ns8b 1/1 Running 0 5h25m
node-resolver-bz7k5 1/1 Running 0 5h19m
node-resolver-c67mw 1/1 Running 0 5h25m
node-resolver-d8h65 1/1 Running 0 5h19m
node-resolver-rgb92 1/1 Running 0 5h25m
Actual results:
The dns pod dns-default-xqxr9 stuck in pending state
Expected results:
There will be reloaded DNS pods
Additional info:
melvinjoseph@mjoseph-mac Downloads % oc describe po/dns-default-xqxr9 -n openshift-dns
Name: dns-default-xqxr9
Namespace: openshift-dns
Priority: 2000001000
<----snip--->
Node-Selectors: kubernetes.io/os=linux
Tolerations: my-dns-test=abc:NoExecute for 3600s
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m45s default-scheduler 0/6 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 Preemption is not helpful for scheduling, 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) didn't match Pod's node affinity/selector..
- is blocked by
-
OCPBUGS-15531 [4.14] DaemonSet fails to scale down during the rolling update when maxUnavailable=0
-
- Closed
-
- is related to
-
OCPBUGS-5943 Excessive TopologyAwareHintsDisabled events due to service/dns-default with topology aware hints activated.
-
- Closed
-
- links to
-
RHEA-2023:5006
rpm