-
Epic
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
test API speedups
-
False
-
None
-
False
-
Testable
-
?
-
To Do
-
?
-
?
-
67% To Do, 0% In Progress, 33% Done
-
-
Our tests take quite long again, and our demands on how many bots we need to keep up are quite high. One aspect to look at is our test API. Some ideas:
- Keep browser running across tests, instead of starting a new process for each test
- check slow bits during nondestructive cleanup: Cockpit#18648, bots#4662
- check slow bits in our CDP driver
- Consider snapshotting a booted "standard" VM (without extra provisioning options), instead of booting them hundreds of times in a test run
- don't make every test go via login page, send basic auth directly, or configure PAM to not use a password
- Cockpit's tests are currently dominated by parallel tests (serial tests take ~ 10 mins on each parallel runner, while the total test runtime is an order of 40 mins). Convert some destructive tests to be nondestructive: Cockpit#18656 and others – the remaining ones potentially could be converted, but will make tests brittle (mostly storage and networking checkpoints)
Start with a research spike for all of these, and create tasks in this epic for the viable ones.
Serial tests
To get a baseline for optimizing serial tests, test cleanups, and browser startup times, I looked at recent test runs of cockpit on fedora-37 on e2e machines which did not have any (affected or failed) retries. I added together the 8 parallel per-global-machine runtimes of the serial tests, and ignored the parallel tests. Total serial test runtime in seconds for each test run that I looked at:
- Chromium: 5569, 5001, 5317, 4883, 5505, 5912 (ø 5634s, σ 381s)
- Firefox: 5433, 6486, 5430, 5833, 6467 (ø 5929s, σ 525s)
For the cleanup, I ran
test/verify/check-networkmanager-basic TestNetworkingBasic.testNoService $RUNC -tv 2>&1| ts -i "%.S"
which looks like this:
00.004462 + journalctl --sync 2>/dev/null || true; sleep 3; journalctl --sync 2>/dev/null || true 03.101321 + journalctl 2>&1 --cursor 's=4ef3d583f50b415990e3fe057af10176;i=31d3;b=f25c5cf0c778444a9c2b89b050905b31;m=1a9ddf7e9;t=5f98433bef06f;x=7f7458222bed1ab8 00.000043 ' -o cat -p 6 SYSLOG_IDENTIFIER=cockpit-ws + SYSLOG_IDENTIFIER=cockpit-bridge + SYSLOG_IDENTIFIER=cockpit/ssh + _COMM=cockpit-ws + GLIB_DOMAIN=cockpit-ws + GLIB_DOMAIN=cockpit-bridge + GLIB_DOMAIN=cockpit-ssh + GLIB_DOMAIN=cockpit-pcp + SYSLOG_IDENTIFIER=systemd-coredump || true 00.056567 + journalctl --cursor 's=4ef3d583f50b415990e3fe057af10176;i=31d3;b=f25c5cf0c778444a9c2b89b050905b31;m=1a9ddf7e9;t=5f98433bef06f;x=7f7458222bed1ab8 00.000055 ' -o cat SYSLOG_IDENTIFIER=kernel 2>&1 | grep 'type=14.*audit' || true 00.052241 -> switch to frame None 00.000044 -> ph_is_present("#navbar-oops") 00.000883 <- {'type': 'boolean', 'value': False} 00.005453 + mv /var/lib/cockpittest/_usr_lib_systemd_system_NetworkManager.service /usr/lib/systemd/system/NetworkManager.service 00.038887 + systemctl enable --now NetworkManager 01.223226 + rm /run/udev/rules.d/99-nm-veth-cockpit42-test.rules; ip link del dev cockpit42 00.143897 + ls /sys/class/net/ | grep -v bonding_masters 00.140833 + for d in ; do nmcli dev del $d; done 00.111905 + umount -lf /etc/sysconfig/network-scripts 00.109296 + umount -lf /etc/NetworkManager 00.073603 + systemctl try-restart NetworkManager 00.179283 + for u in $(loginctl --no-legend list-users | awk '{ if ($2 != "root") print $1 }'); do 00.000051 loginctl terminate-user $u 2>/dev/null || true 00.000008 loginctl kill-user $u 2>/dev/null || true 00.000005 pkill -9 -u $u || true 00.000005 while pgrep -u $u; do sleep 1; done 00.000005 while mountpoint -q /run/user/$u && ! umount /run/user/$u; do sleep 1; done 00.000005 rm -rf /run/user/$u 00.000009 done 00.265342 > warning: transport closed: disconnected 01.033496 + loginctl --no-legend list-sessions | awk '/web console/ { print $1 }' 00.037947 + systemctl restart systemd-logind 00.097220 + loginctl --no-legend list-sessions | awk '/web console/ { print $1 }' 00.076569 + systemctl stop user@*.service 00.125104 + set -e; [ -e /sys/module/scsi_debug ] || exit 0; for dev in $(ls /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*:*/block); do for s in /sys/block/*/slaves/${dev}*; do [ -e $s ] || break; d=/dev/$(dirname $(dirname ${s#/sys/block/})); umount $d || true; dmsetup remove --force $d || true; done; umount /dev/$dev 2>/dev/null || true; done; until rmmod scsi_debug; do sleep 1; done 00.026324 + systemctl stop --quiet cockpit 00.039349 + mv /var/lib/cockpittest/_etc_crypttab /etc/crypttab 00.026280 + mv /var/lib/cockpittest/_etc_fstab /etc/fstab 00.024760 + rm -f /etc/cockpit/cockpit.conf /etc/cockpit/machines.d/* /etc/cockpit/*.override.json 00.024853 + ls /home 00.024639 + mv /var/lib/cockpittest/_var_log_wtmp /var/log/wtmp 00.024873 + mv /var/lib/cockpittest/_etc_subgid /etc/subgid 00.024334 + mv /var/lib/cockpittest/_etc_subuid /etc/subuid 00.024623 + mv /var/lib/cockpittest/_etc_gshadow /etc/gshadow 00.024672 + mv /var/lib/cockpittest/_etc_shadow /etc/shadow 00.024572 + mv /var/lib/cockpittest/_etc_group /etc/group 00.024501 + mv /var/lib/cockpittest/_etc_passwd /etc/passwd 00.024860 + if [ -d /var/lib/cockpittest ]; then findmnt --list --noheadings --output TARGET | grep ^/var/lib/cockpittest | xargs -r umount; rm -r /var/lib/cockpittest; fi 00.031555 + find /var/lib/systemd/coredump -type f -delete 00.025491 + logger -p user.info 'COCKPITTEST: end TestNetworkingBasic.testNoService' 00.032163 Killing browser (pid 45944) 00.017066 killing ssh master process 45859 00.001019 # Result testNoService (__main__.TestNetworkingBasic.testNoService) succeeded 00.000045 # 1 TEST PASSED [23s on cockpit-toolbox]
So the journal sync is the biggest chunk, followed by enabling NM (unavoidable for this test) and the session cleanup.