Uploaded image for project: 'CoreOS OCP'
  1. CoreOS OCP
  2. COS-2649

[openshift/os] Rework build process to generate `rhel-coreos-base` distinct from `ocp-rhel-coreos`

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • Upstream
    • False
    • Hide

      None

      Show
      None
    • False
    • 0
    • 0

      [1232674318] Upstream Reporter: Colin Walters
      Upstream issue status: Closed
      Upstream description:

      1. Reworking RHEL CoreOS to be more like OKD and towards quay.io/openshift/node-base:rhel10

      This pre-enhancement originated in [this github issue](https://github.com/openshift/os/issues/799).

      A foundational decision in early on OpenShift 4 was to create RHEL CoreOS. Key
      aspects of this were:

      • kubelet would not be containerized (negative experience with "system containers")
      • More crucially, we wanted to ship a tested combination of operating system and cluster
      • Also, the operating system updates should come in a container image

      We're several years in now, and have learned a lot. This proposal calls for
      reworking how we build things, but will avoid changing these key aspects.

        1. Rework RHCOS disk images to not have OCP content

      When we speak of RHEL CoreOS, there are two independent things at play:

      • disk images (AMI, qcow2, ISO, etc.)
      • OS update container

      In this base proposal, the disk images shift to only RHEL content.

      • `kubelet` will not be in the AMI.
      • The version will change to something of the form `$rhel.$datestamp`, e.g. `9.2.20220510.1`

      Additionally, there will be a new container image called `rhel-coreos-base` that
      will exactly match this.

      These disk images will generally only be updated at the GA release of each RHEL, and will not contain security updates.

      In phase 0, openshift-installer will continue to have [rhcos.json](https://github.com/openshift/installer/blob/release-4.14/data/data/coreos/rhcos.json). Disk images will continue to be provided at e.g. [mirror.openshift.com](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.14/).

      However, the disk images will be much more likely to be shared across OCP releases in a bit for bit fashion.

        1. machine-os-content/rhel-coreos-9

      The key change here is that OCP content, including `kubelet` move into a container
      image that derives from this base image. One can imagine it as the following `Containerfile`:

      ```dockerfile
      FROM rhel-coreos-base
      RUN rpm-ostree install openshift-hyperkube
      ```

      This is in fact [currently done for OKD](https://github.com/openshift/okd-machine-os/blob/master/Dockerfile).

      ```mermaid
      flowchart TD
      rpms[RHEL rpms] -> base[quay.io/openshift/rhel-coreos-base:9]- Add kubelet, crio, openvswitch --> ocpnode[quay.io/openshift/rhel-coreos:9]
      ```

      In phase 0, this new image will likely be built by the current CoreOS pipeline.

        1. installer changes to always rebase/pivot from the disk image

      Because OCP has not usually respun disk images for releases, at a technical level nodes always do an [in-place OS update](https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md) before kubelet starts.

      In this new model, this is now also the time when kubelet gets installed.

      The only exception to this today for OCP is the bootstrap node. The bootstrap node would switch to also doing an in-place update to the desired node image. This is how OKD works today.

      ```mermaid
      flowchart LR
      installer[openshift-install] ->boot[RHEL base CoreOS disk image]- pull quay.io/openshift/node:rhel10+reboot -->node[OCP node]
      ```

      1. Phase 1 followups

      Consider the above as a "phase 0" - a minimum set of changes to achieve a significant improvement without breaking things.

        1. Create https://gitlab.com/redhat/coreos/base.git

      A while ago, we created github.com/openshift/os to be the source of truth for RHCOS. But after phase 0 is done, conceptually there's nothing OCP specific about this. In order to align with RHEL, we could move into the https://gitlab.com/redhat project.

        1. Images built with (or just mirroring) C9S composes

      We can start producing images that exactly match a C9S compose; including mirroring version numbers.

        1. github.com/openshift/node

      It would make a huge amount of sense to also move the base systemd unit file into what is currently called `rhel-coreos`. The systemd unit [currently lives in the MCO](https://github.com/openshift/machine-config-operator/blob/master/templates/worker/01-worker-kubelet/_base/units/kubelet.service.yaml).

      If we do the above gitlab/coreos/base.git change first, then this git repository could instead change to become openshift/node, and the systemd unit would perhaps live here (but maybe it should really be part of the RPM?)

      Then, a next major step is to have this node image to be built the same way as any other OCP platform image, via Prow for CI and OSBS for production builds. This would significantly simplify the current RHCOS pipeline, and making it much more clear that it should align with RHEL lifecycles and technologies.

      This may be a significant enough change on its own to call for renaming the OS image in the payload (yes, again) to just `node`, de-emphasizing "coreos".

            Unassigned Unassigned
            upstream-sync Upstream Sync
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: