Description of problem
Recently, sos package was added to the tools image used when invoking oc debut node/<some-node> (details in z).
However, the change just added the sos package without taking into account other required conditions required by sos report to work inside a container.
For reference, the toolbox container has to be launched as follows for sos report to work properly (the comand output tells you the template of the right podman run command):
$ podman inspect registry.redhat.io/rhel9/support-tools | jq -r '.[0].Config.Labels.run' podman run -it --name NAME --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=NAME -e IMAGE=IMAGE -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host IMAGE
The most crucial thing is the HOST=/host environment variable, which makes sos report find the real root of the machine in /host, but the other ones are also required.
So if we are to support sos report in the tools image, the debug node container defaults should be changed such that container runs with the same settings than in the reference podman run indicated above.
Version-Release number of selected component (if applicable)
4.16 only
How reproducible
Always
Steps to Reproduce
Start a debug node container (oc debug node/<node>) and try to gather sos report (without chroot /host + toolbox, just from debug container).
Actual results
- Debug container doesn't have the right environment for sos report
- Sos report runs but generates a wrong sos report with limited and meaningless information of the debug container itself.
Expected results:
- oc debug node/<node> to spawn a debug pod with the right environment for sos report to run as correctly as it would do in toolbox.
- Sos report to work as expected in debug pod.
Additional info
(none)
- links to
-
RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update