Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14357

[4.13] configure-ovs blocks ssh access to the node when unhealthy

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.13.0
    • RHCOS
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      We have discovered that in RHEL9 order of systemd dependencies for configure-ovs or nodeip-configuration or something similar have changed. It seems like now those run before systemd-user-sessions.service and as a consequence if the former fails when the machine starts, we cannot login to the RHCOS (at all)

      The outline of what happens is more or less

      • early in the process systemd-tmp* creates a lock file saying "only root user can login"
      • late in the process systemd-user-sessions.service runs and is responsible for allowing anyone to login (namely, "core" user)
      • if anything goes wrong and systemd-user-sessions.service doesn't start, then only "root" can get SSH access to the machine

      Till now we never observed the issue even if configure-ovs wasn't healthy. However in 4.13 something has changed and as long as configure-ovs is not finished successfuly, we cannot do `ssh core@<node>`. Given that we don't allow root access, in those scenarios we are locked out from performing any investigation.

      In the particular scenario I was debugging I had nodeip-configuration.service failing because it was unable to detect Node IP from the VIPs correctly. It was trying to select an empty IP as a Node IP, thus returning non-zero exit code. The network was up as I could ping and reach SSH port (machine had multiple NICs to make it effectively impossible to lose the network), but as I could never SSH as core and root user is locked, I was not able to collect any logs.

            travier@redhat.com Timothée Ravier
            mkowalsk@redhat.com Mat Kowalski
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: