Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35562

After redeploying infra node in Azure Test IPI cluster by machine-set the new node is stuck with uninitialized taint

XMLWordPrintable

    • Critical
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the {azure-first} node controller container did not tolerate the `NoExecute` taint on nodes. This caused a condition where a node would be uninitialized. With this release, the node controller deployment receives an update to tolerate the `NoExecutre` taint, so that nodes can be properly initialized. (link:https://issues.redhat.com/browse/OCPBUGS-34556[*OCPBUGS-34556*])
      Show
      * Previously, the {azure-first} node controller container did not tolerate the `NoExecute` taint on nodes. This caused a condition where a node would be uninitialized. With this release, the node controller deployment receives an update to tolerate the `NoExecutre` taint, so that nodes can be properly initialized. (link: https://issues.redhat.com/browse/OCPBUGS-34556 [*OCPBUGS-34556*])
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-34556. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-33547. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-33405. The following is the description of the original issue:

      Description of problem:

      After upgrading the cluster to 4.14.16 from 4.12.26 Cu got disk pressure in infranode3 so to resolve that Cu deleted the node and so machine-set automatically created a new machine which came to ready state and ran some important pods but Cu can't schedule more pods to it due to uninitialized taint.
      
      --> The taint was not removed unless removed manually. 
      --> According to the Cu the below shown labels were not added to the new node:
      labels:
        - failure-domain.beta.kubernetes.io/zone: westeurope-3
        - node.kubernetes.io/instance-type: Standard_D16s_v3
        - failure-domain.beta.kubernetes.io/region: westeurope
        - beta.kubernetes.io/instance-type: Standard_D16s_v3
        - topology.kubernetes.io/region: westeurope
      --> Also after analyzing more, Cu got to know that the new nodes do not get any public ip which means that the new virtual machine was not added to the gateway backend pool.
      --> No changes were made in the machine-set after upgrading the cluster.
      --> Cu has enough range of ips for his cluster.
      --> The cluster is not configured with accelerated networking so the bug faced due to upgrade is not possible.
      --> The newly added node is in "Ready" state but taint is still present.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          NA

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          Should be upgraded to 4.14.16

      Expected results:

         

      Additional info:

       When Cu added the VM to the default backend pool manually using "az" command and removed the taint manually, everything worked fine.

            mimccune@redhat.com Michael McCune
            openshift-crt-jira-prow OpenShift Prow Bot
            Zhaohua Sun Zhaohua Sun
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: