Uploaded image for project: 'OpenShift Autoscaling'
  1. OpenShift Autoscaling
  2. AUTOSCALE-331

Investigate solutions to Token rotation causes drift and rollout

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • AutoNode
    • AUTOSCALE - Sprint 275

      Related to AUTOSCALE-293

      As a background, Karpenter has a feature called Drift, which is a form of disruption where if there are certain fields that differ from existing NodeClaims versus it's parent NodePool or EC2NodeClass, then Karpenter will mark those differing nodeclaims as "Drifted" and re-rollout new NodeClaims and nodes with the new spec, as a form of cluster reconciliation. 

      There's a field in the EC2NodeClass called "userData" which is a field which allows users to pass in what AWS calls "userData"[1] directly to EC2 instances that Karpenter will provision, as an input for bootstrapping logic. This field is a raw string format, and it is a field that Karpenter considers for drift.

      Now for a background on HyperShift, when hypershift creates guest clusters using the hypershift-operator, it uses something called an "ignition-server" in order to serve CoreOS Ignition configuration bootstrapping files, and this is how at least in ROSA, EC2 instances with CoreOS AMIs are able to be bootstrapped with correct userData based on the coreos version that is being pinned to a OCP release that the user created the guest cluster with. But these ignition servers require authentication in order to send a request to, and that's where these bearer tokens come in.

      These tokens are rotated by hypershift every 5.5 hours for security reasons, and when they do, this causes the userData field in the EC2NodeClass to drift, so unintentionally, user's Karpenter nodes in autonode will drift at this interval for no particular reason, as the rotation is completely transparent to the guest cluster admin's perspective.

      This spike is to document solutions to this problem.

      [1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html 

              rh-ee-macao Max Cao
              rh-ee-macao Max Cao
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: