Uploaded image for project: 'Cockpit'
  1. Cockpit
  2. COCKPIT-979

Speed up our test API

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • None
    • Automation and Tests
    • test API speedups
    • False
    • None
    • False
    • Testable
    • ?
    • To Do
    • ?
    • ?
    • 67% To Do, 0% In Progress, 33% Done

      Our tests take quite long again, and our demands on how many bots we need to keep up are quite high. One aspect to look at is our test API. Some ideas:

      • Keep browser running across tests, instead of starting a new process for each test
      • check slow bits during nondestructive cleanup: Cockpit#18648, bots#4662
      • check slow bits in our CDP driver
      • Consider snapshotting a booted "standard" VM (without extra provisioning options), instead of booting them hundreds of times in a test run
      • don't make every test go via login page, send basic auth directly, or configure PAM to not use a password
      • Cockpit's tests are currently dominated by parallel tests (serial tests take ~ 10 mins on each parallel runner, while the total test runtime is an order of 40 mins). Convert some destructive tests to be nondestructive: Cockpit#18656 and others – the remaining ones potentially could be converted, but will make tests brittle (mostly storage and networking checkpoints)

      Start with a research spike for all of these, and create tasks in this epic for the viable ones.

      Serial tests

      To get a baseline for optimizing serial tests, test cleanups, and browser startup times, I looked at recent test runs of cockpit on fedora-37 on e2e machines which did not have any (affected or failed) retries. I added together the 8 parallel per-global-machine runtimes of the serial tests, and ignored the parallel tests. Total serial test runtime in seconds for each test run that I looked at:

      • Chromium: 5569, 5001, 5317, 4883, 5505, 5912 (ø 5634s, σ 381s)
      • Firefox: 5433, 6486, 5430, 5833, 6467 (ø 5929s, σ 525s)

      For the cleanup, I ran

      test/verify/check-networkmanager-basic TestNetworkingBasic.testNoService $RUNC -tv 2>&1| ts -i "%.S"
      

      which looks like this:

      00.004462 + journalctl --sync 2>/dev/null || true; sleep 3; journalctl --sync 2>/dev/null || true
      03.101321 + journalctl 2>&1 --cursor 's=4ef3d583f50b415990e3fe057af10176;i=31d3;b=f25c5cf0c778444a9c2b89b050905b31;m=1a9ddf7e9;t=5f98433bef06f;x=7f7458222bed1ab8
      00.000043 ' -o cat -p 6 SYSLOG_IDENTIFIER=cockpit-ws + SYSLOG_IDENTIFIER=cockpit-bridge + SYSLOG_IDENTIFIER=cockpit/ssh + _COMM=cockpit-ws + GLIB_DOMAIN=cockpit-ws + GLIB_DOMAIN=cockpit-bridge + GLIB_DOMAIN=cockpit-ssh + GLIB_DOMAIN=cockpit-pcp + SYSLOG_IDENTIFIER=systemd-coredump || true
      00.056567 + journalctl --cursor 's=4ef3d583f50b415990e3fe057af10176;i=31d3;b=f25c5cf0c778444a9c2b89b050905b31;m=1a9ddf7e9;t=5f98433bef06f;x=7f7458222bed1ab8
      00.000055 ' -o cat SYSLOG_IDENTIFIER=kernel 2>&1 | grep 'type=14.*audit' || true
      00.052241 -> switch to frame None
      00.000044 -> ph_is_present("#navbar-oops")
      00.000883 <- {'type': 'boolean', 'value': False}
      00.005453 + mv /var/lib/cockpittest/_usr_lib_systemd_system_NetworkManager.service /usr/lib/systemd/system/NetworkManager.service
      00.038887 + systemctl enable --now NetworkManager
      01.223226 + rm /run/udev/rules.d/99-nm-veth-cockpit42-test.rules; ip link del dev cockpit42
      00.143897 + ls /sys/class/net/ | grep -v bonding_masters
      00.140833 + for d in ; do nmcli dev del $d; done
      00.111905 + umount -lf /etc/sysconfig/network-scripts
      00.109296 + umount -lf /etc/NetworkManager
      00.073603 + systemctl try-restart NetworkManager
      00.179283 + for u in $(loginctl --no-legend list-users  | awk '{ if ($2 != "root") print $1 }'); do
      00.000051                                         loginctl terminate-user $u 2>/dev/null || true
      00.000008                                         loginctl kill-user $u 2>/dev/null || true
      00.000005                                         pkill -9 -u $u || true
      00.000005                                         while pgrep -u $u; do sleep 1; done
      00.000005                                         while mountpoint -q /run/user/$u && ! umount /run/user/$u; do sleep 1; done
      00.000005                                         rm -rf /run/user/$u
      00.000009                                     done
      00.265342 > warning: transport closed: disconnected
      01.033496 + loginctl --no-legend list-sessions | awk '/web console/ { print $1 }'
      00.037947 + systemctl restart systemd-logind
      00.097220 + loginctl --no-legend list-sessions | awk '/web console/ { print $1 }'
      00.076569 + systemctl stop user@*.service
      00.125104 + set -e; [ -e /sys/module/scsi_debug ] || exit 0; for dev in $(ls /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*:*/block); do     for s in /sys/block/*/slaves/${dev}*; do [ -e $s ] || break;         d=/dev/$(dirname $(dirname ${s#/sys/block/}));         umount $d || true; dmsetup remove --force $d || true;     done;     umount /dev/$dev 2>/dev/null || true; done; until rmmod scsi_debug; do sleep 1; done
      00.026324 + systemctl stop --quiet cockpit
      00.039349 + mv /var/lib/cockpittest/_etc_crypttab /etc/crypttab
      00.026280 + mv /var/lib/cockpittest/_etc_fstab /etc/fstab
      00.024760 + rm -f /etc/cockpit/cockpit.conf /etc/cockpit/machines.d/* /etc/cockpit/*.override.json
      00.024853 + ls /home
      00.024639 + mv /var/lib/cockpittest/_var_log_wtmp /var/log/wtmp
      00.024873 + mv /var/lib/cockpittest/_etc_subgid /etc/subgid
      00.024334 + mv /var/lib/cockpittest/_etc_subuid /etc/subuid
      00.024623 + mv /var/lib/cockpittest/_etc_gshadow /etc/gshadow
      00.024672 + mv /var/lib/cockpittest/_etc_shadow /etc/shadow
      00.024572 + mv /var/lib/cockpittest/_etc_group /etc/group
      00.024501 + mv /var/lib/cockpittest/_etc_passwd /etc/passwd
      00.024860 + if [ -d /var/lib/cockpittest ]; then findmnt --list --noheadings --output TARGET | grep ^/var/lib/cockpittest | xargs -r umount; rm -r /var/lib/cockpittest; fi
      00.031555 + find /var/lib/systemd/coredump -type f -delete
      00.025491 + logger -p user.info 'COCKPITTEST: end TestNetworkingBasic.testNoService'
      00.032163 Killing browser (pid 45944)
      00.017066 killing ssh master process 45859
      00.001019 # Result testNoService (__main__.TestNetworkingBasic.testNoService) succeeded
      00.000045 # 1 TEST PASSED [23s on cockpit-toolbox]
      

      So the journal sync is the biggest chunk, followed by enabling NM (unavoidable for this test) and the session cleanup.

              Unassigned Unassigned
              rhn-engineering-mpitt Martin Pitt
              Allison Karlitskaya, Martin Pitt
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: