Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2207

Kerneltype functionality breaks the cluster when the cluster uses a custom osImage

    XMLWordPrintable

Details

    • Important
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      When we deploy a MC with the kerneltype functionality and we deploy any custom osImage, the nodes become degraded.

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.12.0-0.nightly-2022-10-05-053337   True        False         3h53m   Cluster version is 4.12.0-0.nightly-2022-10-05-053337
      

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create a MC in order to user a realtime kernel
      
      cat << EOF | oc create -f -
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: "worker"
        name: change-worker-kernelarg-realtime
      spec:
        kernelType: realtime
      EOF
      
      
      2. Wait for the pools to be updated
      
      $oc get mcp
      
      This is the status in the worker nodes after the MC is applied
      $ rpm-ostree status
      State: idle
      Deployments:
      * ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661
                         Digest: sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661
                      Timestamp: 2022-10-11T08:04:15Z
            RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-372.26.1.el8_6
                LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules
                                 kernel-rt-modules-extra
      
      
      And the new kernel is deployed properly
      $ uname -a
      Linux ip-10-0-159-135 4.18.0-372.26.1.rt7.183.el8_6.x86_64 #1 SMP PREEMPT_RT Sat Aug 27 22:04:33 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux 
      
      3. Create any custom osImage, for example one using this dockerfile.
      
      Use the right base image:
      # Get base image
      $ oc adm release info --image-for "rhel-coreos-8"
      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661 
      
      # Build and push to your repo this Dockerfile
      FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0daf5c4a35424410e88dde102022fc3581302bc8a98e09e2e4748502c59b3661 
      RUN printf '[baseos]\nname=CentOS-$releasever - Base\nbaseurl=http://mirror.centos.org/centos/$releasever-stream/BaseOS/$basearch/os/\ngpgcheck=0\nenabled=1\n\n[appstream]\nname=CentOS-$releasever - AppStream\nbaseurl=http://mirror.centos.org/centos/$releasever-stream/AppStream/$basearch/os/\ngpgcheck=0\nenabled=1\n\n' > /etc/yum.repos.d/centos.repo && \
          rpm-ostree install zsh && \
          rpm-ostree cleanup -m && \
          ostree container commit 
      
      4. Create a MC to deploy the new custom osImage
      
      cat << EOF | oc create -f -
      kind: MachineConfig
      apiVersion: machineconfiguration.openshift.io/v1
      metadata:
        labels:
          machineconfiguration.openshift.io/role: "worker"
        name: "tc-54915-layering-kerneltype-worker"
      spec:
        osImageURL: "quay.io/examplerepo/layering@sha256:879c8f770a580b03bcf32c710f13cdc868156c50abda279b8e2d977d3b40f3f8"
      EOF
      

      Actual results:

      The worker pool becomes degraded, reporting this error:
      
        - lastTransitionTime: "2022-10-11T12:18:39Z"
          message: 'Node ip-10-0-159-135.us-east-2.compute.internal is reporting: "failed
            to update OS to quay.io/examplerepo/layering@sha256:879c8f770a580b03bcf32c710f13cdc868156c50abda279b8e2d977d3b40f3f8
            : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/examplerepo/layering@sha256:879c8f770a580b03bcf32c710f13cdc868156c50abda279b8e2d977d3b40f3f8:
            \x1b[0m\x1b[31merror: \x1b[0mNo enabled repositories\n: exit status 1"'
          reason: 1 nodes are reporting degraded status on sync
          status: "True"
          type: NodeDegraded
      

      Expected results:

      The worker pool should not be degraded. The realtime kernel should be applied properly.

      Additional info:

       

       

       

       

       

       

      Attachments

        Issue Links

          Activity

            People

              team-mco Team MCO
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: