Loading...

Type: Epic
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Cluster Version Operator - Features
Labels:
- 4.16-candidate

Epic Name:
Review and merge code for "upgrade without registry server in disconnected environments"
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Epic Status:
To Do
Feature Link:
OCPSTRAT-763 - [TechPreview]Disconnected Cluster Update and Boot without local image registry
Parent Link:
OCPSTRAT-763[TechPreview]Disconnected Cluster Update and Boot without local image registry
Hierarchy Progress:
100
Hierarchy Progress Bar:

100% 100%

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

Epic Goal

The purpose of this epic is to enable upgrading OpenShift clusters in environment where having access to an image registry server isn't feasible.

Why is this important?

This is important for customers that have OpenShift clusters deployed in environments where access to a registry server is impossible or very inconvenient.

Scenarios

The typical scenario is a far edge site of, or a factory site, where the OpenShift cluster is disconnected from the internet and where bringing up additional infrastructure for a image registry server isn't feasible.

The reasons that make additional infrastructure unfeasible are usually cost and limitations in the knowledge and expertise of the technicians that operates the site.

For example, in far edge sites customers frequently ask for single node configurations and reduced memory and CPU footprint in order to reduce the total cost of the sites. In this scenario additional infrastructure is a no go.

The overall proposal to solve this is to add to OpenShift support for a mechanism to create an "upgrade bundle" that can transported to the cluster offline (in an USB stick, for example) and applied without requiring an image registry server. In that context we would have the following scenarios:

1. An engineer working for a partner, with some basic OpenShift knowledge, will need a tool to prepare an "upgrade bundle" that contains all the necessary bits to upgrade of a cluster. Ideally this tool should be a sub-command of `oc adm upgrade`, for example something like this:

$ oc adm upgrade create bundle \
--arch=x86_64 \
--version=4.12.4 \
--pull-secret=... \
--output=/my/files

This would download all the images required for the upgrade, and put them into a `/my/files/upgrade-4.12.4-x86_64.tar` file.

Once the bundle is prepared the engineer will write it to an USB stick (or any other suitable media) and deliver it to a technicians that will physically go to the cluster site and perform the upgrade.

2. The technician responsible for upgrading a cluster will receive the upgrade bundle, go to the cluster site and make it available in one of the nodes of the cluster. For example, if the bundle is in a USB stick the technician will need to know how to plug that USB stick into one of the nodes of the cluster. Requiring more knowledge than that from the technician is usually not feasible.

3. The cluster version operator will need to be able to read this bundle and orchestrate the upgrade process. Ideally the operator should be monitoring the media in the nodes (via systemd device events, for example) to automatically detect when the upgrade bundle becomes available.

Once the bundle is available in one of the nodes the cluster version operator should replicate it to the rest of the nodes and then load all the images into the CRI-O container storage, making sure that they will not be removed by the CRI-O wipe service or by the kubelet garbage collection.

There are different ways to do this, but in general it will be necessary to modify the `storage.conf` file to add a new directory to the `additionalimagestores` setting and the `crio.conf` file to add the images to the `pinned_images` setting.

Once the images are in the CRI-O container storage the cluster version operator should start the regular upgrade process.

The required configuration parameters should ideally be added to the `desiredUpdate` field of existing `ClusterVersion` object. For example:

desiredUpdate:
  bundle:
    # This will contain the location of the upgrade bundle. Should be optional
    # and by default the cluster version operator should detect it automatically
    # scanning the media devices.
    file: /dev/sdb

    # This will be used to ensure the the right upgrade is applied. Should be optional
    # and used when there is a concern that the media can be inserted in the wrong
    # cluster.
    digest: sha256:123...

  # This is used to indicate if the upgrade should be applied immediately when the
  # bundle is detected. The default should be false, but for some users it is important
  # to schedule the upgrade for later.
  hold: true|false

4. The machine config operator will need to understand that the `storage.conf` and `crio.conf` files will be modified and that those modifications don't require a reboot of the nodes of the cluster: a restart of the CRI-O service is enough.

Ideally these settings should be new fields in the existing `ContainerRuntimeConfig` object. For example:

piVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: upgrade-without-registry
spec:
 containerRuntimeConfig:
   additionalImageStores:
   - /my/additional/images
   - /more/additional/images
   pinnedImages:
   - quay.io/ocp-release-dev/...
   - quay.io/ocp-release-dev/...

5. All the OpenShift components should make sure that they don't use the `Always` pull policy, as that forces the cluster to contact a image registry server, doesn't matter if the image is already pulled and available in the container storage directory.

Dependencies

The `oc` tool needs to implement the creation of the upgrade bundle.
The cluster version operator needs to be implement the upgrade process.
The machine config operator needs to support additional storage directories and image pinning.
All the OpenShift components need to avoid the `Always` image pull policy.

Contributing Teams(and contacts)

Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.

Development -
Documentation -
QE -
PX -
Others -

Acceptance Criteria (optional)

The goal will be achived when it is possible to successfully upgrade an OpenShift cluster (both single node and multi-node) in a fully disconnected environment and without a image registry server.

Drawbacks or Risk (optional)

The main drawback of ths is that it requires rather large changes to the OpenShift cluster version operator.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.