-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.13
-
None
-
Critical
-
No
-
Rejected
-
False
-
Description of the issue
What has been discovered by the OCP Bare Metal Networking Team is that under specific circumstances it is not possible to SSH as `core` user to the RHCOS node due to the `/run/nologin` not being removed. This happens e.g. when one of the services like `nodeip-configuration.service` or `ovs-configure.service` hangs; in such a scenario the network is up&running, but the systemd dependency chain is not satisfied.
Namely `systemd-user-sessions.service` is not reached thus /run/nologin is not removed.
This started to be 100% reproducible in OCP 4.13 and from our internal investigation (OCPBUGS-11124) points at `systemd-pcrphase.service` via the `remote-fs.target` (or internal issue with systemd but probably not). Our expertise ends however at this point so I am providing what we discovered and how to reproduce the issue easily.
Simplified reproducer
0) Start with vanilla RHCOS coming from OCP 4.13
1) Create following systemd unit
[Unit] Description=Removes nologin and pauses the startup for debugging of OCPBUGS-11124 Before=systemd-pcrphase.service [Service] Type=oneshot ExecStart=/bin/bash -c " \ rm /run/nologin; \ echo Now sleeping for 1 hour; \ sleep 3600" [Install] WantedBy=multi-user.target
2) Enable the created unit
systemctl enable ocpbugs-11124-debugger.service
3) Reboot
4) SSH to the node. Note it may take up to 2 minutes. Be aware of your local client timeout
5) Confirm that the stuff is broken by e.g. looking at pam_systemd in the sshd log
$ systemctl status sshd.service [...] Apr 05 10:01:20 worker-0 sshd[1801]: Accepted publickey for core from 192.168.111.1 port 47642 ssh2: RSA SHA256:rXsegwlyTMAN4UfInanm336lxrh+23J4iPyjiuXt4/g Apr 05 10:03:20 worker-0 sshd[1801]: pam_systemd(sshd:session): Failed to create session: Connection timed out Apr 05 10:03:20 worker-0 sshd[1801]: pam_unix(sshd:session): session opened for user core(uid=1000) by (uid=0)
More real-life reproducer
Please note this reproducer above is simplifying a lot here because it will explicitly block `systemd-pcrphase.service`. But in a real OCP running in the field what we want to achieve is to plug sleep into `nodeip-configuration.service` so that it behaves like the unit runs for some long time instead of exiting immediately.
In real life you want to have /etc/systemd/system/nodeip-configuration.service looking a bit like this
[...] ExecStart=/bin/bash -c " \ rm /run/nologin; \ sleep 3600; \ until \ /usr/bin/podman run --rm \ [...]
and then do everything as usual. With this modification (instead of creating a new unit) we are not changing any dependency nor ordering chain of systemd. So the investigation is really like it would be on the field.
Systemd analysis
Discussing with systemd folks, we discovered the following chain of dependencies - nodeip-configuration -> ovs-configure -> network-online.target -> remote-fs.target -> systemd-pcrphase.service > systemd-user-sessions.service
Still don't fully understand what changed between RHEL8 and RHEL9 and why...
Severity assessment
OCP nodes have only `core` user available. Root is disabled by design. With the outlined issue here the consequence is that if something goes wrong with the network configuration (due to user error or bug in one of the OCP Networking components), we cannot anymore SSH to the faulty node. The only available path to recover the access is to use single-user mode via the physical console. This is quite a limitation and is often not possible.
Ongoing discussions
- #systemd-rhel – https://redhat-internal.slack.com/archives/C04NX2E8CDD/p1680690257320539
- #forum-rhel-coreos – https://redhat-internal.slack.com/archives/C999USB0D/p1680181657578689
- duplicates
-
OCPBUGS-11124 configure-ovs blocks ssh access to the node when unhealthy
- Closed
- relates to
-
OCPBUGS-11124 configure-ovs blocks ssh access to the node when unhealthy
- Closed