Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7286

4.13-9.2 builds doesn't work when both kernel-rt and CGroupV1 are used.

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • 3/7: Green as the latest round of testing shows the issue has resolved on the newest build of 4.13-9.2.

      Description of problem:

      I am testing the new versions 4.13-9.2 using the osLayering method. Since on Telco env, kernel-rt is required and CGroupV2 support is not yet available for the use cases we have to test, we intend to use the approach of installing kernel-rt in an environment where CGroupV1 has been reactivated. However, it is not possible to get the system to finish booting under these conditions.

      Version-Release number of selected component (if applicable):

      413.92.202302031414-0

      How reproducible:

      https://redhat-internal.slack.com/archives/C04MBTXN0KG/p1675871583062239?thread_ts=1675783438.469439&cid=C04MBTXN0KG

      Steps to Reproduce:

      1. Install registry.ci.openshift.org/rhcos-devel/ocp-4.13-9.0:4.13.0-ec.1 build
      
      2. Get 413.92.202302031414-0 Base Container Image and deploy it using MC osLayering:
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: master
        name: sno-osoverride
      spec:
        osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd4151f0b3e5d51c7b582446d6ea5de06d8afab76ebfcf50c9260e63bafd7591
      
      3. Once installed, apply this MC to reenable cgroups-v1 support:
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: master
        name: 80-kernelarg-enable-cgroup-v1
      spec:
        config:
          ignition:
            version: 3.2.0
        kernelArguments:
          - systemd.unified_cgroup_hierarchy=0
          - systemd.legacy_systemd_cgroup_controller
      
      4. Log in to the server and install kernel-rt:
      
      # sed -i "/\[rt\]/,/\[/  s/enabled=0/enabled=1/" /etc/yum.repos.d/centos-addons.repo
      # sed -i "/\[nfv\]/,/\[/  s/enabled=0/enabled=1/" /etc/yum.repos.d/centos-addons.repo
      # rpm-ostree override remove kernel{,-core,-modules,-modules-extra} \
          --install kernel-rt-core \
          --install kernel-rt-modules \
          --install kernel-rt-modules-extra \
          --install kernel-rt-kvm
      
      5. reboot

      Actual results:

      The node keeps in infinite loop:

      Due to this:

      [root@cnfdf23 ~]# journalctl -b -1 |grep -B1 -A4 '2023-02-09 12:41:05.517376369'
      Feb 09 12:41:05 cnfdf23.telco5gran.eng.rdu2.redhat.com systemd[1]: Started libcrun container.
      Feb 09 12:41:05 cnfdf23.telco5gran.eng.rdu2.redhat.com podman[6417]: 2023-02-09 12:41:05.517376369 +0000 UTC m=+0.117352772 container init 4de14378f0ffe9711b024c608a80a5954a177ee6a61d9de9fb8aff233a6d188e (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5271b2b0a4f6063170c57c45a16345f2fc0e2f676a260ddc8f916a58ce13e23c, name=gallant_mahavira, health_status=, io.openshift.maintainer.subcomponent=runtime-cfg, io.k8s.display-name=baremetal-runtimecfg, description=Retrieves Node and Cluster information for baremetal network config, io.k8s.description=Retrieves Node and Cluster information for baremetal network config, io.openshift.build.source-location=https://github.com/openshift/baremetal-runtimecfg, io.openshift.expose-services=, release=202212011938.p0.gc508d15.assembly.stream, distribution-scope=public, vendor=Red Hat, Inc., License=GPLv2+, io.openshift.maintainer.product=OpenShift Container Platform, architecture=x86_64, maintainer=Antoni Segura Puimedon <antoni@redhat.com>, summary=Provides the latest release of the Red Hat Extended Life Base Image., vcs-type=git, io.openshift.build.commit.url=https://github.com/openshift/baremetal-runtimecfg/commit/c508d157332b2945f1ecf786be4f193b9e428e51, io.openshift.maintainer.component=Networking, url=https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-baremetal-runtimecfg/images/v4.13.0-202212011938.p0.gc508d15.assembly.stream, build-date=2022-12-01T19:48:29, com.redhat.build-host=cpt-1001.osbs.prod.upshift.rdu2.redhat.com, name=openshift/ose-baremetal-runtimecfg, io.buildah.version=1.27.1, io.openshift.tags=openshift,base, vcs-ref=0ee29b5e2dc48ba09db14731f1fcc2528be978ed, com.redhat.license_terms=https://www.redhat.com/agreements, version=v4.13.0, io.openshift.build.commit.id=c508d157332b2945f1ecf786be4f193b9e428e51, com.redhat.component=ose-baremetal-runtimecfg-container)
      Feb 09 12:41:05 cnfdf23.telco5gran.eng.rdu2.redhat.com podman[6417]: 2023-02-09 12:41:05.433618248 +0000 UTC m=+0.033594652 image pull  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5271b2b0a4f6063170c57c45a16345f2fc0e2f676a260ddc8f916a58ce13e23c
      Feb 09 12:41:05 cnfdf23.telco5gran.eng.rdu2.redhat.com systemd[1]: libpod-4de14378f0ffe9711b024c608a80a5954a177ee6a61d9de9fb8aff233a6d188e.scope: Killing process 6451 (3) with signal SIGKILL.
      Feb 09 12:41:05 cnfdf23.telco5gran.eng.rdu2.redhat.com systemd[1]: libpod-4de14378f0ffe9711b024c608a80a5954a177ee6a61d9de9fb8aff233a6d188e.scope: Deactivated successfully.
      Feb 09 12:41:05 cnfdf23.telco5gran.eng.rdu2.redhat.com systemd[1]: Stopped libcrun container.

      Expected results:

      Node should starts

      Additional info:

       

        1. boot-kernel-rt.log
          526 kB
          Carlos Cardeñosa Pérez
        2. boot-non-kernel-rt.log
          9.05 MB
          Carlos Cardeñosa Pérez
        3. image.png
          112 kB
          Carlos Cardeñosa Pérez
        4. kubelet-CSIMigrationAzureFile-issue.log
          23 kB
          Carlos Cardeñosa Pérez
        5. podman-ocp-strace.log
          1.68 MB
          Carlos Cardeñosa Pérez
        6. podman-with-traces-ocp-strace.kernel-rt-and-cgv1.1676296832.log
          101 kB
          Carlos Cardeñosa Pérez
        7. rpm-ostree
          15.06 MB
          Joseph Marrero Corchado
        8. systemd-killing-podman.png
          508 kB
          Carlos Cardeñosa Pérez

              tsweeney@redhat.com Tom Sweeney
              ccardeno@redhat.com Carlos Cardeñosa Pérez
              Michael Nguyen Michael Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              29 Start watching this issue

                Created:
                Updated:
                Resolved: