Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5384

Old AWS boot images vs. 4.12: unknown provider 'ec2'

XMLWordPrintable

    • Moderate
    • None
    • Approved
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      4.2 AWS boot images such as ami-01e7fdcb66157b224 include the old ignition.platform.id=ec2 kernel command line parameter. When launched against 4.12.0-rc.3, new machines fail with:

      1. The old user-data and old AMI successfully get to the machine-config-server request stage.
      2. The new instance will then request the full Ignition from /config/worker , and the machine-config server translates that to the old Ignition v2 spec format.
      3. The instance will lay down that Ignition-formatted content, and then try and reboot into the new state.
      4. Coming back up in the new state, the modern Afterburn comes up to try and figure out a node name for the kubelet, and this fails with unknown provider 'ec2'.

      Version-Release number of selected component (if applicable):

      coreos-assemblers used ignition.platform.id=ec2, but pivoted to =aws here. It's not clear when that made its way into new AWS boot images. Some time after 4.2 and before 4.6.

      Afterburn dropped support for legacy command-line options like the ec2 slug in 5.0.0. But it's not clear when that shipped into RHCOS. The release controller points at this RHCOS diff, but that has afterburn-0-5.3.0-1 builds on both sides.

      How reproducible:

      100%, given a sufficiently old AMI and a sufficiently new OpenShift release target.

      Steps to Reproduce:

      1. Install 4.12.0-rc.3 or similar new OpenShift on AWS in us-east-1.
      2. Create Ignition v2 user-data in a Secret in openshift-machine-api. I'm fuzzy on how to do that portion easily, since it's basically RFE-3001 backwards.
      3. Edit a compute MachineSet to set spec.template.spec.providerSpec.value.ami to id: ami-01e7fdcb66157b224 and also point it at your v2 user-data Secret.
      4. Possibly delete an existing Machine in that MachineSet, or raise replicas, or otherwise talk the MachineSet controller into provisioning a new Machine to pick up the reconfigured AMI.

      Actual results:

      The new Machine will get to Provisioned but fail to progress to Running. systemd journal logs will include unknown provider 'ec2' for Afterburn units.

      Expected results:

      Old boot-image AMIs can successfully update to 4.12.

      Alternatively, we pin down the set of exposed boot images sufficiently that users with older clusters can audit for exposure and avoid the issue by updating to more modern boot images (although updating boot images is not trivial, see RFE-3001 and the Ignition spec 2 to 3 transition discussed in kcs#5514051.

            walters@redhat.com Colin Walters
            trking W. Trevor King
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: