Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44904

RHCOS fails to reboot after disk install from agent based installer on s390x

    • None
    • Multi-Arch Sprint 263
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, the first reboot failed while running Agent-based Installer using FCP or NVME storage devices for multiple images on s390x hardware. With this release, this issue is resolved and the reboot completes. (
      ====
      With the support of multi-images on s390x the first reboot is failing while running Agend-based Installer using FCP or NVMe storage devices. With this release this issue is fixed. (OCPBUGS-44904)
      Show
      Previously, the first reboot failed while running Agent-based Installer using FCP or NVME storage devices for multiple images on s390x hardware. With this release, this issue is resolved and the reboot completes. ( ==== With the support of multi-images on s390x the first reboot is failing while running Agend-based Installer using FCP or NVMe storage devices. With this release this issue is fixed. ( OCPBUGS-44904 )
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-42553. The following is the description of the original issue:

      Context thread.

      Description of problem:

           Monitoring the 4.18 agent-based installer CI job for s390x (https://github.com/openshift/release/pull/50293) I discovered unexpected behavoir onces the installation triggers reboot into disk step for the 2nd and 3rd control plane nodes. (The first control plane node is rebooted last because it's also the bootstrap node). Instead of rebooting successully as expected, it fails to find the OSTree and drops to dracut, stalling the installation.

      Version-Release number of selected component (if applicable):

          OpenShift 4.18 on s390x only; discovered using agent installer

      How reproducible:

          Try to install OpenShift 4.18 using agent-based installer on s390x

      Steps to Reproduce:

          1. Boot nodes with XML (see attached)
          2. Wait for installation to get to reboot phase.
          

      Actual results:

          Control plane nodes fail to reboot.

      Expected results:

          Control plane nodes reboot and installation progresses.

      Additional info:

          See attached logs.

            [OCPBUGS-44904] RHCOS fails to reboot after disk install from agent based installer on s390x

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (OpenShift Container Platform 4.17.9 bug fix update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHBA-2024:11010

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (OpenShift Container Platform 4.17.9 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:11010

            Tested with ABI 4.17.8
            [root@bastion-ocpz ~]# oc get no
            NAME       STATUS   ROLES                         AGE    VERSION
            master-0   Ready    control-plane,master,worker   86m    v1.30.6
            master-1   Ready    control-plane,master,worker   104m   v1.30.6
            master-2   Ready    control-plane,master,worker   104m   v1.30.6
            worker-0   Ready    worker                        93m    v1.30.6

            Amadeus Podvratnik added a comment - Tested with ABI 4.17.8 [root@bastion-ocpz ~] # oc get no NAME       STATUS   ROLES                         AGE    VERSION master-0   Ready    control-plane,master,worker   86m    v1.30.6 master-1   Ready    control-plane,master,worker   104m   v1.30.6 master-2   Ready    control-plane,master,worker   104m   v1.30.6 worker-0   Ready    worker                        93m    v1.30.6

            Michael Nguyen added a comment - - edited

            rh-ee-apodvrat can you verify this?  This is not my component and I see you verified the 4.18 one.

            Michael Nguyen added a comment - - edited rh-ee-apodvrat can you verify this?  This is not my component and I see you verified the 4.18 one.

            Thanks for reporting your issue!

            In order for the CoreOS team to be able to triage your issue, please copy the applicable parts of the following template into a comment and fill them out as completely as possible.


            • OCP Version at Install Time:
            • RHCOS Version at Install Time:
            • OCP Version after Upgrade (if applicable):
            • RHCOS Version after Upgrade (if applicable):
            • Platform (AWS, Azure, bare metal, GCP, vSphere, etc.):
            • Architecture (x86_64, ppc64le, s390x, etc.):

            If you're having problems booting/installing RHCOS, please provide:

            • The full contents of the serial console showing disk initialization, network configuration, and Ignition stages.
              • See this article for information about configuring your serial console.
              • Screenshots or a video recording of the console is usually not sufficient.
            • The full Ignition config (JSON format)

            If you're having problems post-installation or post-upgrade, please provide:

            • An sos report for affected nodes. See this documentation page for instructions on how to gather one.
            • A complete must-gather (oc adm must-gather)

            If you're having SELinux related issues, please provide:

            • The full /var/log/audit/audit.log file
            • Were any SELinux modules or booleans changed from the default configuration?
            • The output of ostree admin config-diff | grep selinux/targeted on impacted nodes

            OpenShift Jira Bot added a comment - Thanks for reporting your issue! In order for the CoreOS team to be able to triage your issue, please copy the applicable parts of the following template into a comment and fill them out as completely as possible. OCP Version at Install Time: RHCOS Version at Install Time: OCP Version after Upgrade (if applicable): RHCOS Version after Upgrade (if applicable): Platform (AWS, Azure, bare metal, GCP, vSphere, etc.): Architecture (x86_64, ppc64le, s390x, etc.): If you're having problems booting/installing RHCOS, please provide: The full contents of the serial console showing disk initialization, network configuration, and Ignition stages. See this article for information about configuring your serial console. Screenshots or a video recording of the console is usually not sufficient. The full Ignition config (JSON format) If you're having problems post-installation or post-upgrade, please provide: An sos report for affected nodes. See this documentation page for instructions on how to gather one. A complete must-gather ( oc adm must-gather ) If you're having SELinux related issues, please provide: The full /var/log/audit/audit.log file Were any SELinux modules or booleans changed from the default configuration? The output of ostree admin config-diff | grep selinux/targeted on impacted nodes

              kdo@redhat.com Kha Do
              openshift-crt-jira-prow OpenShift Prow Bot
              Amadeus Podvratnik Amadeus Podvratnik
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: