Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42481

AWS cluster nodes are getting deleted from the cluster due to chane in DNS name

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      During the cluster upgrade from 4.13 to 4.14, the dns of the node changes from hostname.customdomain.net to hostname.ec2.internal

      Version-Release number of selected component (if applicable):

          4.14.36

      How reproducible:

          I dont have the exact steps to reproduce but we can try to uprgade a ocp cluster from version 4.13 to 4.14.36

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

       Node dns changed and the node is not getting added back to cluster post reboot of the node.

      Expected results:

      node dns should not change and upgrade should be successful.    

      Additional info:

          followed the workaround mentioned in bug
      
      https://issues.redhat.com/browse/OCPBUGS-29432?focusedId=24165685&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-24165685
      
      Allow node to reboot into new Machine-config iteration
      update /etc/kubernetes/node.env file to reflect the CORRECT hostname (exampled below):
      cat /etc/kubernetes/node.env
      KUBELET_NODE_NAME=ip-10-131-136-36.ec2.internal 
            3. Restart kubelet on the host node (do not restart) and proceed to the next node. (then upgrade cluster to 4.14.11) for the fix, as it is related to this bug: https://issues.redhat.com//browse/OCPBUGS-27261 Proceed to step 4 if you encounter issues with kubelet approvals:
      
            4. If kubelet reports that it is forbidden to contact API due to similar error messaging as below and the node stays in NOTREADY then move to step 5:
      
      Feb 14 16:26:45 ip-10-131-136-36 kubenswrapper[8121]: I0214 16:26:45.848529    8121 csi_plugin.go:913] Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io "ip-10-131-136-36.ec2.internal" is forbidden: User "system:node:ip-10-131-136-36.<cusom>.local" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope: can only access CSINode with the same name as the requesting node 
           5. make a folder at /var/lib/kubelet/pki/backup and copy all contents of /var/lib/kubelet/pki/*.pem into the target folder
      
           6. restart kubelet again and then check for csrs (you will be looking for a bootstrapper CSR, approve that, then a subsequent set of CSRs set for the node with it's proper name (2x) that both must be approved. After that the node will return to READY status.

              joelspeed Joel Speed
              rhn-support-ssonigra Sonigra Saurab
              Huali Liu Huali Liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: