Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17063

[BUG] Assisted installer fails to create bond.

XMLWordPrintable

    • Important
    • No
    • AI-32, AI-33, AI-34, AI-35
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated

      Description of problem:

      The assisted installer will fail to honor/create bond always using yaml and the failure seems to be due to files not getting copied.
      
      Checking this further we captured a journalctl out of it which shows it never copied the bond0 configs files,
      
      
      ~~~
      Jul 28 14:51:50 localhost bash[2847]: Info: Found host directory: host0 , copying configuration
      Jul 28 14:51:52 localhost bash[2847]: Info: Copied /tmp/tmp.Q9kLrHDM7E/host0/bond0.nmconnection to /tmp/tmp.hdosiwCZeo/eno12399np0.nmconnection
      Jul 28 14:51:52 localhost bash[2847]: Info: Copied /tmp/tmp.Q9kLrHDM7E/host0/eno12399np0.nmconnection to /tmp/tmp.hdosiwCZeo/eno12399np0.nmconnection
      Jul 28 14:51:52 localhost bash[2847]: Info: Copied /tmp/tmp.Q9kLrHDM7E/host0/ens2f0np0.nmconnection to /tmp/tmp.hdosiwCZeo/ens2f0np0.nmconnection
      Jul 28 14:51:52 localhost bash[2847]: Info: Using logical interface name 'bond0' for interface with Mac address '00:62:0B:1D:1E:C0', updated /tmp/tmp.hdosiwCZeo/eno12399np0.nmconnection
      Jul 28 14:51:52 localhost bash[2847]: Info: Using logical interface name 'bond0' for interface with Mac address '00:62:0B:1D:1E:C0', updated /tmp/tmp.hdosiwCZeo/ens2f0np0.nmconnection
      Jul 28 14:51:52 localhost bash[2847]: Info: Removing default connection files in '/etc/NetworkManager/system-connections'
      Jul 28 14:51:52 localhost bash[2847]: Info: Copying files from working directory to '/etc/NetworkManager/system-connections'
      Jul 28 14:51:52 localhost bash[2943]: '/tmp/tmp.w2Ejj7kdsa/eno12399np0.nmconnection' -> '/etc/NetworkManager/system-connections/eno12399np0.nmconnection'
      Jul 28 14:51:52 localhost bash[2943]: '/tmp/tmp.w2Ejj7kdsa/ens2f0np0.nmconnection' -> '/etc/NetworkManager/system-connections/ens2f0np0.nmconnection'
      Jul 28 14:51:52 localhost bash[2847]: PreNetworkConfig End
      Jul 28 14:51:52 localhost systemd[1]: pre-network-manager-config.service: Succeeded.
      ~~~
      
      Further, due to this the failure continues for installation.
      
      ~~~
      Jul 28 14:53:04 localhost.localdomain systemd[3269]: Started Podman Start All Containers With Restart Policy Set To Always.
      Jul 28 14:53:04 localhost.localdomain systemd[3269]: Started podman-pause-a60a8624.scope.
      Jul 28 14:53:04 localhost.localdomain podman[3569]: Trying to pull registry.redhat.io/rhai-tech-preview/assisted-installer-agent-rhel8:v1.0.0-269...
      Jul 28 14:53:04 localhost.localdomain systemd[3269]: Started Podman auto-update service.
      Jul 28 14:53:04 localhost.localdomain systemd[3269]: Reached target Default.
      Jul 28 14:53:04 localhost.localdomain systemd[3269]: Startup finished in 2.845s.
      Jul 28 14:53:04 localhost.localdomain podman[3569]: Error: initializing source docker://registry.redhat.io/rhai-tech-preview/assisted-installer-agent-rhel8:v1.0.0-269: pinging container registry registry.redhat.io: Get "https://registry.redhat.io/v2/": dial tcp: lookup registry.redhat.io on [::1]:53: read udp [::1]:52859->[::1]:53: read: connection refused
      Jul 28 14:53:04 localhost.localdomain systemd[1]: agent.service: Control process exited, code=exited status=125
      Jul 28 14:53:04 localhost.localdomain systemd[1]: agent.service: Failed with result 'exit-code'.
      Jul 28 14:53:04 localhost.localdomain systemd[1]: Failed to start agent.service.
      Jul 28 14:53:04 localhost.localdomain systemd[1]: Reached target Multi-User System.
      Jul 28 14:53:04 localhost.localdomain systemd[1]: Reached target Graphical Interface.
      Jul 28 14:53:04 localhost.localdomain systemd[1]: Starting Update UTMP about System Runlevel Changes...
      Jul 28 14:53:04 localhost.localdomain systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
      Jul 28 14:53:04 localhost.localdomain systemd[1]: Started Update UTMP about System Runlevel Changes.
      Jul 28 14:53:04 localhost.localdomain systemd[1]: Startup finished in 3.376s (kernel) + 1min 27.375s (initrd) + 2min 7.879s (userspace) = 3min 38.631s.
      Jul 28 14:53:07 localhost.localdomain systemd[1]: agent.service: Service RestartSec=3s expired, scheduling restart.
      Jul 28 14:53:07 localhost.localdomain systemd[1]: agent.service: Scheduled restart job, restart counter is at 1.
      Jul 28 14:53:07 localhost.localdomain systemd[1]: Stopped agent.service.
      Jul 28 14:53:07 localhost.localdomain systemd[1]: Starting agent.service...
      Jul 28 14:53:08 localhost.localdomain systemd[1]: var-lib-containers-storage-overlay.mount: Succeeded.
      Jul 28 14:53:08 localhost.localdomain systemd[1]: var-lib-containers-storage-overlay.mount: Succeeded.
      Jul 28 14:53:08 localhost.localdomain podman[3730]: Trying to pull registry.redhat.io/rhai-tech-preview/assisted-installer-agent-rhel8:v1.0.0-269...
      Jul 28 14:53:08 localhost.localdomain podman[3730]: Error: initializing source docker://registry.redhat.io/rhai-tech-preview/assisted-installer-agent-rhel8:v1.0.0-269: pinging container registry registry.redhat.io: Get "https://registry.redhat.io/v2/": dial tcp: lookup registry.redhat.io on [::1]:53: read udp [::1]:50167->[::1]:53: read: connection refused
      Jul 28 14:53:08 localhost.localdomain systemd[1]: agent.service: Control process exited, code=exited status=125
      Jul 28 14:53:08 localhost.localdomain systemd[1]: agent.service: Failed with result 'exit-code'.
      Jul 28 14:53:08 localhost.localdomain systemd[1]: Failed to start agent.service.
      ..
      ~~~
      
      At the last we copied the bond0 config file to expected location followed by restart to NetworkManager service and it started the install by making bond up.
      
      
      The journalctl added with attachment show it at the end where you will notice the failure count is above 200+ iteration. So consider that network being up as manual move by us.
      
      
      Adding supporting screenshots and journactl for this with https://drive.google.com/drive/folders/1Qdn-UVnLN_eyeEV0n-N9AlEVr6i9hee6?usp=sharing

      Version-Release number of selected component (if applicable):

      4.12

      How reproducible:

      Always

      Steps to Reproduce:

      1. console.redhat.com -> assisted installer.
      2. Add the network configs as below for host1
      --------------
      interfaces:
      - name: bond0
        type: bond
        state: up
        ipv4:
          address:
          - ip: 10.16.81.181
            prefix-length: 24
          dhcp: false
          enabled: true
        link-aggregation:
          mode: balance-alb
          options:
            miimon: '1'
          port:
          - eno12399np0
          - ens2f0np0
      routes:
        config:
        - destination: 0.0.0.0/0
          next-hop-address: xxxx
          next-hop-interface: bond0
      dns-resolver:
        config:
          search:
          - xxxx
          - xxxx
          server:
          - xxxx
          - xxxx
      ---------------
      
      3. Enter the mac addresses of interface in the fields. 
      4.The host1 will never have an working IP or able to ssh.
      

      Actual results:

      Fails the install.

      Expected results:

      Host1 shall take the bond0 configs and shall work.

      Additional info:

      It seems that the host1 never copies the bond configs. If the bond0 file is copied to '/etc/NetworkManager/system-connections/` from `/sysroot/etc/assisted/network/host0/bond0.nmconnection` or /etc/assisted/network/host0/bond0.nmconnection` then it works.
      
      Simply, On firstboot off the discovery iso, if customer copy the bond0 config into /etc/NetworkManager/system-connections as is, from /etc/assisted/network/host0 where it resides, restart NetworkManager, and nmcli down and up the bond0 device the system gets on the network and shows up in discovered hosts.
      Whatever is preventing this copy operation from /etc/assisted/network/host0 to /etc/NetworkManager/system-connections for ONLY the bond0.nmconnection file is the issue here. As customer noted before the slave (eno12399np0 ens2f0np0) nmconnection files ARE getting copied over and the same can be noticed in the journalctl gathered and added to attachment.
      
      Note: To gather journalctl we had to break thing and login using console as on first boot the network wasn't working.

              oamizur Ori Amizur
              rhn-support-pkhedeka Parikshit Khedekar
              Lital Alon Lital Alon
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: