Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.14.0
Affects Version/s: 4.14, 4.14.z
Component/s: Networking / DNS
Labels:
None

Regression:
No
Sprint:
Sprint 238, Sprint 239, Sprint 240, Sprint 241
sprint_count:
4
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

 After custom toleration (tainting the dns pod) on master node the dns pod stuck in pending state

Version-Release number of selected component (if applicable):

How reproducible:

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-41050

Steps to Reproduce:

1.melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-05-03-163151   True        False         4h5m    Cluster version is 4.14.0-0.nightly-2023-05-03-163151
2.check default dns pods placement
melvinjoseph@mjoseph-mac Downloads % ouf5M-5AVBm-Taoxt-aIgPmoc -n openshift-dns get pod -owide
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
dns-default-6cv9k     2/2     Running   0          4h12m   10.131.0.8    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
dns-default-8g2w8     2/2     Running   0          4h12m   10.129.2.5    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
dns-default-df7zj     2/2     Running   0          4h18m   10.128.0.40   shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
dns-default-kmv4c     2/2     Running   0          4h18m   10.130.0.9    shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
dns-default-lxxkt     2/2     Running   0          4h18m   10.129.0.11   shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
dns-default-mjrnx     2/2     Running   0          4h11m   10.128.2.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
node-resolver-5bnjv   1/1     Running   0          4h12m   10.0.128.3    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
node-resolver-7ns8b   1/1     Running   0          4h18m   10.0.0.4      shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
node-resolver-bz7k5   1/1     Running   0          4h12m   10.0.128.2    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
node-resolver-c67mw   1/1     Running   0          4h18m   10.0.0.3      shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
node-resolver-d8h65   1/1     Running   0          4h12m   10.0.128.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
node-resolver-rgb92   1/1     Running   0          4h18m   10.0.0.5      shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>

 3.oc -n openshift-dns get ds/dns-default -oyaml
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  creationTimestamp: "2023-05-08T00:39:00Z"
  finalizers:
  - dns.operator.openshift.io/dns-controller
  generation: 1
  name: default
  resourceVersion: "22893"
  uid: ae53e756-42a3-4c9d-8284-524df006382d
spec:
  cache:
    negativeTTL: 0s
    positiveTTL: 0s
  logLevel: Normal
  nodePlacement: {}
  operatorLogLevel: Normal
  upstreamResolvers:
    policy: Sequential
    transportConfig: {}
    upstreams:
    - port: 53
      type: SystemResolvConf
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2023-05-08T00:46:20Z"
    message: Enough DNS pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2023-05-08T00:46:20Z"
    message: All DNS and node-resolver pods are available, and the DNS service has
      a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2023-05-08T00:39:25Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster
      IP address.
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2023-05-08T00:39:01Z"
    message: DNS Operator can be upgraded
    reason: AsExpected
    status: "True"
    type: Upgradeable


4. config custom tolerations of dns pod (to not tolerate master node taints)
 $ oc edit dns.operator default
 spec:
   nodePlacement:
     tolerations:
     - effect: NoExecute
       key: my-dns-test
       operators: Equal
       value: abc
       tolerationSeconds: 3600 
melvinjoseph@mjoseph-mac Downloads % oc edit dns.operator default
Warning: unknown field "spec.nodePlacement.tolerations[0].operators"
dns.operator.openshift.io/default edited
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
dns-default-6cv9k     2/2     Running   0          5h16m   10.131.0.8    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
dns-default-8g2w8     2/2     Running   0          5h16m   10.129.2.5    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
dns-default-df7zj     2/2     Running   0          5h22m   10.128.0.40   shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
dns-default-kmv4c     2/2     Running   0          5h22m   10.130.0.9    shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
dns-default-lxxkt     2/2     Running   0          5h22m   10.129.0.11   shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
dns-default-mjrnx     2/2     Running   0          5h16m   10.128.2.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
dns-default-xqxr9     0/2     Pending   0          7s      <none>        <none>                                                     <none>           <none>
node-resolver-5bnjv   1/1     Running   0          5h17m   10.0.128.3    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
node-resolver-7ns8b   1/1     Running   0          5h22m   10.0.0.4      shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
node-resolver-bz7k5   1/1     Running   0          5h16m   10.0.128.2    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
node-resolver-c67mw   1/1     Running   0          5h22m   10.0.0.3      shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
node-resolver-d8h65   1/1     Running   0          5h16m   10.0.128.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
node-resolver-rgb92   1/1     Running   0          5h22m   10.0.0.5      shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>


The dns pod stuck in pending state

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get ds/dns-default -oyaml
<-----snip--->
      tolerations:
      - effect: NoExecute
        key: my-dns-test
        tolerationSeconds: 3600
        value: abc
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile
            path: Corefile
          name: dns-default
        name: config-volume
      - name: metrics-tls
        secret:
          defaultMode: 420
          secretName: dns-default-metrics-tls
  updateStrategy:
    rollingUpdate:
      maxSurge: 10%
      maxUnavailable: 0
    type: RollingUpdate
status:
  currentNumberScheduled: 3
  desiredNumberScheduled: 3
  numberAvailable: 3
  numberMisscheduled: 3
  numberReady: 3
  observedGeneration: 2


melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  creationTimestamp: "2023-05-08T00:39:00Z"
  finalizers:
  - dns.operator.openshift.io/dns-controller
  generation: 2
  name: default
  resourceVersion: "125435"
  uid: ae53e756-42a3-4c9d-8284-524df006382d
spec:
  cache:
    negativeTTL: 0s
    positiveTTL: 0s
  logLevel: Normal
  nodePlacement:
    tolerations:
    - effect: NoExecute
      key: my-dns-test
      tolerationSeconds: 3600
      value: abc
  operatorLogLevel: Normal
  upstreamResolvers:
    policy: Sequential
    transportConfig: {}
    upstreams:
    - port: 53
      type: SystemResolvConf
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2023-05-08T00:46:20Z"
    message: Enough DNS pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2023-05-08T06:01:52Z"
    message: Have 0 up-to-date DNS pods, want 3.
    reason: Reconciling
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-05-08T00:39:25Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster
      IP address.
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2023-05-08T00:39:01Z"
    message: DNS Operator can be upgraded
    reason: AsExpected
    status: "True"
    type: Upgradeable


melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod                  
NAME                  READY   STATUS    RESTARTS   AGE
dns-default-6cv9k     2/2     Running   0          5h18m
dns-default-8g2w8     2/2     Running   0          5h18m
dns-default-df7zj     2/2     Running   0          5h25m
dns-default-kmv4c     2/2     Running   0          5h25m
dns-default-lxxkt     2/2     Running   0          5h25m
dns-default-mjrnx     2/2     Running   0          5h18m
dns-default-xqxr9     0/2     Pending   0          2m12s
node-resolver-5bnjv   1/1     Running   0          5h19m
node-resolver-7ns8b   1/1     Running   0          5h25m
node-resolver-bz7k5   1/1     Running   0          5h19m
node-resolver-c67mw   1/1     Running   0          5h25m
node-resolver-d8h65   1/1     Running   0          5h19m
node-resolver-rgb92   1/1     Running   0          5h25m

Actual results:

The dns pod dns-default-xqxr9  stuck in pending state

Expected results:

There will be reloaded DNS pods

Additional info:

melvinjoseph@mjoseph-mac Downloads % oc describe po/dns-default-xqxr9  -n openshift-dns
Name:                 dns-default-xqxr9
Namespace:            openshift-dns
Priority:             2000001000


<----snip--->
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 my-dns-test=abc:NoExecute for 3600s
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3m45s  default-scheduler  0/6 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 Preemption is not helpful for scheduling, 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) didn't match Pod's node affinity/selector..

is blocked by

OCPBUGS-15531 [4.14] DaemonSet fails to scale down during the rolling update when maxUnavailable=0

Closed

is related to

OCPBUGS-5943 Excessive TopologyAwareHintsDisabled events due to service/dns-default with topology aware hints activated.

Closed

links to

openshift/cluster-dns-operator#379: OCPBUGS-13209: Revert "Set DNS DaemonSet's maxSurge value to 10%"

RHEA-2023:5006 rpm

Assignee:: Andrey Lebedev

Reporter:: Melvin Joseph

QA Contact:: Melvin Joseph

Contributors:: Shudi Li

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/05/08 6:06 AM

Updated:: 2023/10/31 1:33 PM

Resolved:: 2023/10/31 1:16 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates