Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18947

Workers failing to request credentials and become node

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • No
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Based on the analysis in https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2023-05-16%2023%3A59%3A59&baseRelease=4.13&baseStartTime=2023-04-18%2000%3A00%3A00&capability=operator-conditions&component=Cloud%20Compute%20%2F%20Other%20Provider&confidence=95&environment=ovn%20no-upgrade%20amd64%20aws%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Calibaba%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=aws&sampleEndTime=2023-09-12%2023%3A59%3A59&sampleRelease=4.14&sampleStartTime=2023-09-05%2000%3A00%3A00&testId=Operator%20results%3Aff3f4ce2ada4b853ece12306b1ef3eaf&testName=operator%20conditions%20machine-api&upgrade=no-upgrade&variant=standard, we can see towards the bottom that a number of jobs are now permafailing on 4.14, which worked on 4.13 (hence this being considered a blocker/regression).
      
      Having reviewed the failures, the worker machines are not successfully joining the cluster, the majority of the time. Some clusters had 1 worker join, some had no workers join.
      
      The Machine API appears to be working correctly and is creating EC2 instances correctly. Looking at the MCS logs we see that the nodes have requested and been served the ignition data. However, the next clue, CSRs for client credential bootstrapping, are missing.
      
      Somehow, somewhere between fetching the ignition and requesting client credentials the boot process has failed.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Currently showing as permanent on the disruptive installer jobs

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      This was raised to my attention in https://redhat-internal.slack.com/archives/C01CQA76KMX/p1694547905992139

              sdasu@redhat.com Sandhya Dasu
              joelspeed Joel Speed
              None
              None
              Sunil Choudhary Sunil Choudhary
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: