Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33017

[Azure] node kubelet server certificate throwing x509 errors after Azure VM Live Migration

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • No
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      After an Azure VM Live Migration a node's kubelet server certificate becomes unusable. Meaning the node can no longer authenticate with the Kube API Server. Eventually MachineHealthCheck will replace the node.

      The error message that the kube api produces in response to the node's kubelet looks like:

      server.go:291 "Unable to authenticate the request due to an error" err="verifying certificate SN=123456789012345678901234567890, SKID=, AKID=12:AB:34:CD:56:EF:78:GH:90:IJ failed: x509: certificate signed by unknown authority" Post "https://api-int.myopenshiftcluster.io:6443/apis/authentication.k8s.io/v1/tokenreviews/" : http2: client connection lost 

       

      Version-Release number of selected component (if applicable):
      4.12.46 and 4.12.53

      How reproducible:

      Somewhat. Customer has reproduced this on two different clusters. SRE has not been able to reproduce on test clusters.

      Steps to Reproduce:
      N/A - See attached RCA. Will ask customer to provide must-gather of the clusters for triage.

      Actual results:

      After Live Migration node becomes not ready and is cleaned up by MHC 15 mins later

      Expected results:
      After Live Migration node becomes ready and remains operational

      Additional info:

      https://learn.microsoft.com/en-us/azure/virtual-machines/maintenance-and-updates#live-migration 

       

      Potential Outcomes:

      We verify this an OCP bug and have a fix for it.

      Alternatively, we verify this NOT an OCP bug. In this case we need to gather as much data as possible to open a bug against Azure Live Migration with Microsoft.

              Unassigned Unassigned
              lranjbar@redhat.com Lisa Ranjbar (Inactive)
              None
              None
              Xingxing Xia Xingxing Xia
              None
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: