Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3386

A replaced bare metal master node cannot reach the api server: Get "https://[fd02::1]:443/api/v1/namespaces/ dial tcp [fd02::1]:443: i/o timeout

XMLWordPrintable

    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      11/30: changed Telco rank/bucket to 2 since it's not clear how soon or how frequently Telco customers will need to do this
      11/29: added to the Telco-Grade OCP 4.12 gating list, appears to be a regression with the master node replacement procedure
      Show
      11/30: changed Telco rank/bucket to 2 since it's not clear how soon or how frequently Telco customers will need to do this 11/29: added to the Telco-Grade OCP 4.12 gating list, appears to be a regression with the master node replacement procedure

      Description of problem:

      A replaced bare metal master node cannot reach the api server: Get "https://[fd02::1]:443/api/v1/namespaces/ dial tcp [fd02::1]:443: i/o timeout
      
      
      

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2022-11-05-000615

      How reproducible:

      Reproduced once

      Steps to Reproduce:

      1. Deploy 3 x master nodes bare metal cluster
      
      2. Replace one of the master node by following the steps in https://access.redhat.com/documentation/en-us/openshift_container_platform/4.11/html-single/backup_and_restore/index#restore-replace-stopped-baremetal-etcd-member_replacing-unhealthy-etcd-member
      
      3. After baremetal host is provisioned and the node gets registered check etcd pods running on the new node:
      
      oc -n openshift-etcd logs installer-14-retry-5-openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com | tail -5  

      Actual results:

      Calls to https://[fd02::1]:443/ are timing out
      
      oc -n openshift-etcd logs installer-14-retry-5-openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com | tail -5
      I1108 11:02:41.018359       1 cmd.go:423] Waiting for installer revisions to settle for node openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com
      W1108 11:02:55.019114       1 cmd.go:467] Error getting installer pods on current node openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com: Get "https://[fd02::1]:443/api/v1/namespaces/openshift-etcd/pods?labelSelector=app%3Dinstaller": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      W1108 11:03:19.023778       1 cmd.go:467] Error getting installer pods on current node openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com: Get "https://[fd02::1]:443/api/v1/namespaces/openshift-etcd/pods?labelSelector=app%3Dinstaller": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      W1108 11:03:39.023304       1 cmd.go:467] Error getting installer pods on current node openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com: Get "https://[fd02::1]:443/api/v1/namespaces/openshift-etcd/pods?labelSelector=app%3Dinstaller": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      F1108 11:03:44.689025       1 cmd.go:106] timed out waiting for the condition
      

      Expected results:

      Pods can successfully reach the api server.

      Additional info:

      Running must-gather is timing out due to `dial tcp [fd02::1]:443: i/o timeout` so attaching pods captured from nodes /var/log/pods. 

            npinaeva@redhat.com Nadia Pinaeva
            mcornea@redhat.com Marius Cornea
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: