Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-13209

After custom tolerations of dns pod, the new pod stuck in pending state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.14.0
    • 4.14, 4.14.z
    • Networking / DNS
    • None
    • No
    • Sprint 238, Sprint 239, Sprint 240, Sprint 241
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

       After custom toleration (tainting the dns pod) on master node the dns pod stuck in pending state

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-41050

      Steps to Reproduce:

      1.melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.14.0-0.nightly-2023-05-03-163151   True        False         4h5m    Cluster version is 4.14.0-0.nightly-2023-05-03-163151
      2.check default dns pods placement
      melvinjoseph@mjoseph-mac Downloads % ouf5M-5AVBm-Taoxt-aIgPmoc -n openshift-dns get pod -owide
      melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
      NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
      dns-default-6cv9k     2/2     Running   0          4h12m   10.131.0.8    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
      dns-default-8g2w8     2/2     Running   0          4h12m   10.129.2.5    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
      dns-default-df7zj     2/2     Running   0          4h18m   10.128.0.40   shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
      dns-default-kmv4c     2/2     Running   0          4h18m   10.130.0.9    shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
      dns-default-lxxkt     2/2     Running   0          4h18m   10.129.0.11   shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
      dns-default-mjrnx     2/2     Running   0          4h11m   10.128.2.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
      node-resolver-5bnjv   1/1     Running   0          4h12m   10.0.128.3    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
      node-resolver-7ns8b   1/1     Running   0          4h18m   10.0.0.4      shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
      node-resolver-bz7k5   1/1     Running   0          4h12m   10.0.128.2    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
      node-resolver-c67mw   1/1     Running   0          4h18m   10.0.0.3      shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
      node-resolver-d8h65   1/1     Running   0          4h12m   10.0.128.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
      node-resolver-rgb92   1/1     Running   0          4h18m   10.0.0.5      shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
      
       3.oc -n openshift-dns get ds/dns-default -oyaml
            tolerations:
            - key: node-role.kubernetes.io/master
              operator: Exists melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
      apiVersion: operator.openshift.io/v1
      kind: DNS
      metadata:
        creationTimestamp: "2023-05-08T00:39:00Z"
        finalizers:
        - dns.operator.openshift.io/dns-controller
        generation: 1
        name: default
        resourceVersion: "22893"
        uid: ae53e756-42a3-4c9d-8284-524df006382d
      spec:
        cache:
          negativeTTL: 0s
          positiveTTL: 0s
        logLevel: Normal
        nodePlacement: {}
        operatorLogLevel: Normal
        upstreamResolvers:
          policy: Sequential
          transportConfig: {}
          upstreams:
          - port: 53
            type: SystemResolvConf
      status:
        clusterDomain: cluster.local
        clusterIP: 172.30.0.10
        conditions:
        - lastTransitionTime: "2023-05-08T00:46:20Z"
          message: Enough DNS pods are available, and the DNS service has a cluster IP address.
          reason: AsExpected
          status: "False"
          type: Degraded
        - lastTransitionTime: "2023-05-08T00:46:20Z"
          message: All DNS and node-resolver pods are available, and the DNS service has
            a cluster IP address.
          reason: AsExpected
          status: "False"
          type: Progressing
        - lastTransitionTime: "2023-05-08T00:39:25Z"
          message: The DNS daemonset has available pods, and the DNS service has a cluster
            IP address.
          reason: AsExpected
          status: "True"
          type: Available
        - lastTransitionTime: "2023-05-08T00:39:01Z"
          message: DNS Operator can be upgraded
          reason: AsExpected
          status: "True"
          type: Upgradeable
      
      
      4. config custom tolerations of dns pod (to not tolerate master node taints)
       $ oc edit dns.operator default
       spec:
         nodePlacement:
           tolerations:
           - effect: NoExecute
             key: my-dns-test
             operators: Equal
             value: abc
             tolerationSeconds: 3600 
      melvinjoseph@mjoseph-mac Downloads % oc edit dns.operator default
      Warning: unknown field "spec.nodePlacement.tolerations[0].operators"
      dns.operator.openshift.io/default edited
      melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
      NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
      dns-default-6cv9k     2/2     Running   0          5h16m   10.131.0.8    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
      dns-default-8g2w8     2/2     Running   0          5h16m   10.129.2.5    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
      dns-default-df7zj     2/2     Running   0          5h22m   10.128.0.40   shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
      dns-default-kmv4c     2/2     Running   0          5h22m   10.130.0.9    shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
      dns-default-lxxkt     2/2     Running   0          5h22m   10.129.0.11   shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
      dns-default-mjrnx     2/2     Running   0          5h16m   10.128.2.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
      dns-default-xqxr9     0/2     Pending   0          7s      <none>        <none>                                                     <none>           <none>
      node-resolver-5bnjv   1/1     Running   0          5h17m   10.0.128.3    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
      node-resolver-7ns8b   1/1     Running   0          5h22m   10.0.0.4      shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
      node-resolver-bz7k5   1/1     Running   0          5h16m   10.0.128.2    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
      node-resolver-c67mw   1/1     Running   0          5h22m   10.0.0.3      shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
      node-resolver-d8h65   1/1     Running   0          5h16m   10.0.128.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
      node-resolver-rgb92   1/1     Running   0          5h22m   10.0.0.5      shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
      
      
      The dns pod stuck in pending state
      
      melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get ds/dns-default -oyaml
      <-----snip--->
            tolerations:
            - effect: NoExecute
              key: my-dns-test
              tolerationSeconds: 3600
              value: abc
            volumes:
            - configMap:
                defaultMode: 420
                items:
                - key: Corefile
                  path: Corefile
                name: dns-default
              name: config-volume
            - name: metrics-tls
              secret:
                defaultMode: 420
                secretName: dns-default-metrics-tls
        updateStrategy:
          rollingUpdate:
            maxSurge: 10%
            maxUnavailable: 0
          type: RollingUpdate
      status:
        currentNumberScheduled: 3
        desiredNumberScheduled: 3
        numberAvailable: 3
        numberMisscheduled: 3
        numberReady: 3
        observedGeneration: 2
      
      
      melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
      apiVersion: operator.openshift.io/v1
      kind: DNS
      metadata:
        creationTimestamp: "2023-05-08T00:39:00Z"
        finalizers:
        - dns.operator.openshift.io/dns-controller
        generation: 2
        name: default
        resourceVersion: "125435"
        uid: ae53e756-42a3-4c9d-8284-524df006382d
      spec:
        cache:
          negativeTTL: 0s
          positiveTTL: 0s
        logLevel: Normal
        nodePlacement:
          tolerations:
          - effect: NoExecute
            key: my-dns-test
            tolerationSeconds: 3600
            value: abc
        operatorLogLevel: Normal
        upstreamResolvers:
          policy: Sequential
          transportConfig: {}
          upstreams:
          - port: 53
            type: SystemResolvConf
      status:
        clusterDomain: cluster.local
        clusterIP: 172.30.0.10
        conditions:
        - lastTransitionTime: "2023-05-08T00:46:20Z"
          message: Enough DNS pods are available, and the DNS service has a cluster IP address.
          reason: AsExpected
          status: "False"
          type: Degraded
        - lastTransitionTime: "2023-05-08T06:01:52Z"
          message: Have 0 up-to-date DNS pods, want 3.
          reason: Reconciling
          status: "True"
          type: Progressing
        - lastTransitionTime: "2023-05-08T00:39:25Z"
          message: The DNS daemonset has available pods, and the DNS service has a cluster
            IP address.
          reason: AsExpected
          status: "True"
          type: Available
        - lastTransitionTime: "2023-05-08T00:39:01Z"
          message: DNS Operator can be upgraded
          reason: AsExpected
          status: "True"
          type: Upgradeable
      
      
      melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod                  
      NAME                  READY   STATUS    RESTARTS   AGE
      dns-default-6cv9k     2/2     Running   0          5h18m
      dns-default-8g2w8     2/2     Running   0          5h18m
      dns-default-df7zj     2/2     Running   0          5h25m
      dns-default-kmv4c     2/2     Running   0          5h25m
      dns-default-lxxkt     2/2     Running   0          5h25m
      dns-default-mjrnx     2/2     Running   0          5h18m
      dns-default-xqxr9     0/2     Pending   0          2m12s
      node-resolver-5bnjv   1/1     Running   0          5h19m
      node-resolver-7ns8b   1/1     Running   0          5h25m
      node-resolver-bz7k5   1/1     Running   0          5h19m
      node-resolver-c67mw   1/1     Running   0          5h25m
      node-resolver-d8h65   1/1     Running   0          5h19m
      node-resolver-rgb92   1/1     Running   0          5h25m

      Actual results:

      The dns pod dns-default-xqxr9  stuck in pending state

      Expected results:

      There will be reloaded DNS pods

      Additional info:

      melvinjoseph@mjoseph-mac Downloads % oc describe po/dns-default-xqxr9  -n openshift-dns
      Name:                 dns-default-xqxr9
      Namespace:            openshift-dns
      Priority:             2000001000
      
      
      <----snip--->
      Node-Selectors:              kubernetes.io/os=linux
      Tolerations:                 my-dns-test=abc:NoExecute for 3600s
                                   node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/not-ready:NoExecute op=Exists
                                   node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/unreachable:NoExecute op=Exists
                                   node.kubernetes.io/unschedulable:NoSchedule op=Exists
      Events:
        Type     Reason            Age    From               Message
        ----     ------            ----   ----               -------
        Warning  FailedScheduling  3m45s  default-scheduler  0/6 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 Preemption is not helpful for scheduling, 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) didn't match Pod's node affinity/selector..

              alebedev@redhat.com Andrey Lebedev
              rhn-support-mjoseph Melvin Joseph
              Melvin Joseph Melvin Joseph
              Shudi Li
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: