This is a clone of issue OCPBUGS-34842. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-22366. The following is the description of the original issue:
—
Description of problem:
In QE's CI testing, we found the openvswitch service was stopped after updating the openvswitch package during an upgrade process, which cause the RHEL worker losing the Internet connection and the whole upgrade to be failed.
Version-Release number of selected component (if applicable):
release-4.13
How reproducible:
Always in Prow CI jobs
Steps to Reproduce:
1. Set up a cluster on azure, 3 coreos masters + 3 rhel workers 2. Upgrade the cluster 3. Master are upgrade successfully 4. Start to upgrade the RHEL workers using upgrade playbook 5. The playbook is upgrading one of the RHEL worker: TASK [openshift_node : Install openshift packages] ***************************** .... <10.0.1.8> (0, b'\n{"ansible_job_id": "817973574852.42491", "erased": "/root/.ansible_async/817973574852.42491", "invocation": {"module_args": {"jid": "817973574852.42491", "mode": "cleanup", "_async_dir": "/root/.ansible_async"}}}\n', b'') ASYNC OK on 10.0.1.8: jid=817973574852.42491 changed: [10.0.1.8] => { "ansible_job_id": "817973574852.42491", "attempts": 1, "changed": true, "finished": 1, "invocation": { "module_args": { "allow_downgrade": false, "allowerasing": true, "autoremove": false, "bugfix": false, "cacheonly": false, "conf_file": null, "disable_excludes": null, "disable_gpg_check": true, "disable_plugin": [], "disablerepo": [], "download_dir": null, "download_only": false, "enable_plugin": [], "enablerepo": [], "exclude": [], "install_repoquery": true, "install_weak_deps": true, "installroot": "/", "list": null, "lock_timeout": 30, "name": [ "conmon", "cri-o-1.26.4", "cri-tools", "openshift-clients-4.13*", "openshift-hyperkube-4.13*", "podman", "runc", "kernel", "systemd", "selinux-policy-targeted", "setools-console", "dracut-network", "passwd", "openssh-server", "openssh-clients", "skopeo", "containernetworking-plugins", "nfs-utils", "NetworkManager", "NetworkManager-ovs", "dnsmasq", "lvm2", "iscsi-initiator-utils", "sg3_utils", "device-mapper-multipath", "xfsprogs", "e2fsprogs", "mdadm", "cryptsetup", "chrony", "logrotate", "sssd", "shadow-utils", "sudo", "coreutils", "less", "tar", "xz", "gzip", "bzip2", "rsync", "tmux", "nmap-ncat", "net-tools", "bind-utils", "strace", "bash-completion", "vim-minimal", "nano", "authconfig", "iptables-services", "cifs-utils", "jq", "libseccomp", "openvswitch3.1", "policycoreutils-python-utils", "microcode_ctl", "irqbalance", "biosdevname", "glusterfs-fuse" ], "nobest": false, "releasever": null, "security": false, "skip_broken": false, "sslverify": true, "state": "latest", "update_cache": false, "update_only": false, "validate_certs": true } }, "msg": "", "rc": 0, "results": [ "Installed: cri-o-1.26.4-4.1.rhaos4.13.git92b763a.el8.x86_64", "Installed: runc-4:1.1.9-1.1.rhaos4.13.el8.x86_64", "Installed: cri-tools-1.26.0-2.1.el8.x86_64", "Installed: openshift-clients-4.13.0-202310162106.p0.g717d4a5.assembly.stream.el8.x86_64", "Installed: conmon-3:2.1.7-2.1.rhaos4.13.el8.x86_64", "Installed: openshift-hyperkube-4.13.0-202310210425.p0.g636f2be.assembly.stream.el8.x86_64", "Installed: openvswitch3.1-3.1.0-61.el8fdp.x86_64", "Installed: skopeo-2:1.11.2-2.1.rhaos4.13.el8.x86_64", "Removed: cri-o-1.25.4-4.1.rhaos4.12.gitb9319a2.el8.x86_64", "Removed: cri-tools-1.25.0-2.1.el8.x86_64", "Removed: runc-3:1.1.6-4.1.rhaos4.12.el8.x86_64", "Removed: openshift-clients-4.12.0-202310180726.p0.ga55beda.assembly.stream.el8.x86_64", "Removed: openshift-hyperkube-4.12.0-202310210144.p0.g31e0558.assembly.stream.el8.x86_64", "Removed: skopeo-2:1.11.2-0.2.module+el8.8.0+19993+47c8ef84.x86_64", "Removed: openvswitch2.17-2.17.0-123.el8fdp.x86_64", "Removed: conmon-3:2.1.6-1.module+el8.8.0+19993+47c8ef84.x86_64" ], "results_file": "/root/.ansible_async/817973574852.42491", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": [] } ... TASK [openshift_node : Pull MCD image] ***************************************** task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/apply_machine_config.yml:60 Using module file /opt/python-env/ansible-core/lib64/python3.8/site-packages/ansible/modules/command.py Pipelining is enabled. <10.0.1.8> ESTABLISH SSH CONNECTION FOR USER: cloud-user <10.0.1.8> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="cloud-user"' -o ConnectTimeout=30 -o IdentityFile=/var/run/secrets/ci.openshift.io/cluster-profile/ssh-privatekey -o StrictHostKeyChecking=no -o 'ProxyCommand=ssh -o IdentityFile=/var/run/secrets/ci.openshift.io/cluster-profile/ssh-privatekey -o ConnectTimeout=30 -o ConnectionAttempts=100 -o StrictHostKeyChecking=no -W %h:%p -q core@4.151.214.43' -o 'ControlPath="/alabama/.ansible/cp/%h-%r"' 10.0.1.8 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-dboorgbfowovrzyggbmdgegpwmrvoknq ; http_proxy='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' https_proxy='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' no_proxy='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' /usr/libexec/platform-python'"'"'"'"'"'"'"'"' && sleep 0'"'"'' Escalation succeeded <10.0.1.8> (1, b'\n{"changed": true, "stdout": "", "stderr": "Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441...\\nError: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441: reading manifest sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized", "rc": 125, "cmd": ["podman", "pull", "--tls-verify=False", "--authfile", "/var/lib/kubelet/config.json", "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441"], "start": "2023-10-24 13:41:51.676499", "end": "2023-10-24 13:43:08.774173", "delta": "0:01:17.097674", "failed": true, "msg": "non-zero return code", "invocation": {"module_args": {"_raw_params": "podman pull --tls-verify=False --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441", "_uses_shell": false, "warn": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}}\n', b'') <10.0.1.8> Failed to connect to the host via ssh: FAILED - RETRYING: [10.0.1.8]: Pull MCD image (12 retries left).Result was: { "attempts": 1, "changed": true, "cmd": [ "podman", "pull", "--tls-verify=False", "--authfile", "/var/lib/kubelet/config.json", "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441" ], "delta": "0:01:17.097674", "end": "2023-10-24 13:43:08.774173", "invocation": { "module_args": { "_raw_params": "podman pull --tls-verify=False --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true, "warn": false } }, "msg": "non-zero return code", "rc": 125, "retries": 13, "start": "2023-10-24 13:41:51.676499", "stderr": "Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441...\nError: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441: reading manifest sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized", "stderr_lines": [ "Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441...", "Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441: reading manifest sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" ], "stdout": "", "stdout_lines": [] } ... 6. On this moment, the updating rhel worker lost internet connectivity, so failed to pull MCD image 7. SSH into the RHEL worker, run curl command against google.com, hang there, either, so confirmed the internet connectivity is lost. 8. Checking openvswitch services. [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ journalctl -u ovs-vswitchd.service ... Oct 24 13:40:58 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00316|connmgr|INFO|br-int<->unix#15: 58 flow_mods 10 s ago (4 adds, 54 deletes) Oct 24 13:41:01 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00317|connmgr|INFO|br-ex<->unix#1080: 2 flow_mods in the last 0 s (2 adds) Oct 24 13:41:02 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00318|bridge|INFO|bridge br-int: deleted interface daa1890fc840ffa on port 4 Oct 24 13:41:05 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00319|connmgr|INFO|br-int<->unix#15: 29 flow_mods 3 s ago (2 adds, 27 deletes) Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopping Open vSwitch Forwarding Unit... Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00320|bridge|INFO|bridge br-ex: deleted interface eth0 on port 1 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00321|bridge|INFO|bridge br-ex: deleted interface br-ex on port 65534 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00322|bridge|INFO|bridge br-ex: deleted interface patch-br-ex_ci-op-5jms8f2c-fe944-984d5-rhel-2-to-br-int on port 2 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00323|ofproto_dpif_rid|ERR|recirc_id 3714 left allocated when ofproto (br-ex) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00324|ofproto_dpif_rid|ERR|recirc_id 23 left allocated when ofproto (br-ex) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00325|ofproto_dpif_rid|ERR|recirc_id 3719 left allocated when ofproto (br-ex) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00326|bridge|INFO|bridge br-int: deleted interface ovn-3e037a-0 on port 2 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00327|bridge|INFO|bridge br-int: deleted interface ovn-af77fa-0 on port 3 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00328|bridge|INFO|bridge br-int: deleted interface ovn-k8s-mp0 on port 9 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00329|bridge|INFO|bridge br-int: deleted interface ovn-aee0f0-0 on port 8 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00330|bridge|INFO|bridge br-int: deleted interface ovn-38d58c-0 on port 7 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00331|bridge|INFO|bridge br-int: deleted interface ovn-d3dd7f-0 on port 1 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00332|bridge|INFO|bridge br-int: deleted interface patch-br-int-to-br-ex_ci-op-5jms8f2c-fe944-984d5-rhel-2 on port > Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00333|bridge|INFO|bridge br-int: deleted interface br-int on port 65534 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00334|bridge|INFO|bridge br-int: deleted interface bdc09b6ea877236 on port 6 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00335|bridge|INFO|bridge br-int: deleted interface 73057cfb3038ad7 on port 12 Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00336|ofproto_dpif_rid|ERR|recirc_id 55 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00337|ofproto_dpif_rid|ERR|recirc_id 3717 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00338|ofproto_dpif_rid|ERR|recirc_id 3745 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00339|ofproto_dpif_rid|ERR|recirc_id 3741 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00340|ofproto_dpif_rid|ERR|recirc_id 3693 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00341|ofproto_dpif_rid|ERR|recirc_id 3694 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00342|ofproto_dpif_rid|ERR|recirc_id 3698 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00343|ofproto_dpif_rid|ERR|recirc_id 25 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00344|ofproto_dpif_rid|ERR|recirc_id 3697 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00345|ofproto_dpif_rid|ERR|recirc_id 3718 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00346|ofproto_dpif_rid|ERR|recirc_id 3716 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00347|ofproto_dpif_rid|ERR|recirc_id 3715 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00348|ofproto_dpif_rid|ERR|recirc_id 3740 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00349|ofproto_dpif_rid|ERR|recirc_id 3696 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00350|ofproto_dpif_rid|ERR|recirc_id 3744 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00351|ofproto_dpif_rid|ERR|recirc_id 31 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-ctl[44658]: Exiting ovs-vswitchd (1159) [ OK ] Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: ovs-vswitchd.service: Succeeded. Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopped Open vSwitch Forwarding Unit. Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: ovs-vswitchd.service: Consumed 21.766s CPU time [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ date Tue Oct 24 13:45:27 UTC 2023 [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ tail -f /var/log/dnf.log 2023-10-24T13:41:21+0000 DEBUG Upgraded: cri-tools-1.26.0-2.1.el8.x86_64 2023-10-24T13:41:21+0000 DEBUG Upgraded: openshift-clients-4.13.0-202310162106.p0.g717d4a5.assembly.stream.el8.x86_64 2023-10-24T13:41:21+0000 DEBUG Upgraded: openshift-hyperkube-4.13.0-202310210425.p0.g636f2be.assembly.stream.el8.x86_64 2023-10-24T13:41:21+0000 DEBUG Upgraded: runc-4:1.1.9-1.1.rhaos4.13.el8.x86_64 2023-10-24T13:41:21+0000 DEBUG Upgraded: skopeo-2:1.11.2-2.1.rhaos4.13.el8.x86_64 2023-10-24T13:41:21+0000 DEBUG Installed: openvswitch3.1-3.1.0-61.el8fdp.x86_64 2023-10-24T13:41:21+0000 DEBUG Skipped: openvswitch2.17-2.17.0-62.el8fdp.x86_64 2023-10-24T13:41:21+0000 DEBUG Skipped: openvswitch2.17-2.17.0-62.el8fdp.x86_64 2023-10-24T13:41:21+0000 DEBUG Skipped: openvswitch2.17-2.17.0-110.el8fdp.x86_64 2023-10-24T13:41:21+0000 DEBUG Removed: openvswitch2.17-2.17.0-123.el8fdp.x86_64 [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ systemctl status openvswitch.service ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled) Active: inactive (dead) since Tue 2023-10-24 13:41:18 UTC; 15min ago Main PID: 1196 (code=exited, status=0/SUCCESS) CPU: 1ms Oct 24 13:06:10 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Starting Open vSwitch... Oct 24 13:06:10 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Started Open vSwitch. Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopping Open vSwitch... Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: openvswitch.service: Succeeded. Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopped Open vSwitch. Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: openvswitch.service: Consumed 1ms CPU time [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ systemctl status ovs-vswitchd ● ovs-vswitchd.service - Open vSwitch Forwarding Unit Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled) Drop-In: /etc/systemd/system/ovs-vswitchd.service.d └─10-ovs-vswitchd-restart.conf Active: inactive (dead) since Tue 2023-10-24 13:41:19 UTC; 16min ago Main PID: 1159 (code=exited, status=0/SUCCESS) CPU: 21.766s Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00346|ofproto_dpif_rid|ERR|recirc_id 3716 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00347|ofproto_dpif_rid|ERR|recirc_id 3715 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00348|ofproto_dpif_rid|ERR|recirc_id 3740 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00349|ofproto_dpif_rid|ERR|recirc_id 3696 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00350|ofproto_dpif_rid|ERR|recirc_id 3744 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00351|ofproto_dpif_rid|ERR|recirc_id 31 left allocated when ofproto (br-int) is destructed Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-ctl[44658]: Exiting ovs-vswitchd (1159) [ OK ] Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: ovs-vswitchd.service: Succeeded. Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopped Open vSwitch Forwarding Unit. Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: ovs-vswitchd.service: Consumed 21.766s CPU time 9. From the above log, the openvswitch service get stopped by some unknown reason, from the dnf log and the timestamp (Oct 24 13:41), seem like it is caused by openvswitch package update. 10. Manually start openvswitch service on the rhel worker [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ sudo systemctl start openvswitch.service [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ systemctl status openvswitch.service ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled) Active: active (exited) since Tue 2023-10-24 14:01:44 UTC; 5s ago Process: 60646 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 60646 (code=exited, status=0/SUCCESS) CPU: 1ms Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Starting Open vSwitch... Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Started Open vSwitch. [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ systemctl status ovs-vswitchd ● ovs-vswitchd.service - Open vSwitch Forwarding Unit Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled) Drop-In: /etc/systemd/system/ovs-vswitchd.service.d └─10-ovs-vswitchd-restart.conf Active: active (running) since Tue 2023-10-24 14:01:44 UTC; 12s ago Main PID: 60630 (ovs-vswitchd) Tasks: 8 (limit: 100010) Memory: 142.7M CPU: 192ms CGroup: /system.slice/ovs-vswitchd.service └─60630 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/> Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00007|stream_ssl|ERR|SSL_use_certificate_file: error:02001002:system library:fopen:No such file or directory Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00008|stream_ssl|ERR|SSL_use_PrivateKey_file: error:20074002:BIO routines:file_ctrl:system lib Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00009|stream_ssl|ERR|failed to load client certificates from /ovn-ca/ca-bundle.crt: error:140AD002:SSL routines:SS> Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-ctl[60588]: Starting ovs-vswitchd [ OK ] Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-ctl[60588]: Enabling remote OVSDB managers [ OK ] Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vsctl[60645]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait add Open_vSwitch . external-ids hostname=ci-op-5jms8f2c-fe944-984d5-rh> Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Started Open vSwitch Forwarding Unit. Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00001|ofproto_dpif_xlate(handler2)|WARN|Invalid Geneve tunnel metadata on bridge br-int while processing tcp,in_po> Oct 24 14:01:54 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00055|memory|INFO|174144 kB peak resident set size after 10.1 seconds Oct 24 14:01:54 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00056|memory|INFO|handlers:4 idl-cells-Open_vSwitch:815 ports:13 revalidators:2 rules:9 udpif keys:8 11. The hang step (Pull MCD image) in the upgrade playbook succeeded, that means the rhel worker get internet connectivity back.
Actual results:
Expected results:
So it would be better to have a restart of the openvswitch service before pulling the MCD image(https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/tasks/upgrade.yml#L32) to guarantee the Internet connectivity if there's `openvswitch` package update in the install.yml(https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/tasks/upgrade.yml#L29C6-L29C6)
Additional info:
- blocks
-
OCPBUGS-41723 Suggest adding a restart of openvswitch service after updating the openvswitch package during RHEL node upgrade
- Closed
- clones
-
OCPBUGS-34842 Suggest adding a restart of openvswitch service after updating the openvswitch package during RHEL node upgrade
- Closed
- is blocked by
-
OCPBUGS-34842 Suggest adding a restart of openvswitch service after updating the openvswitch package during RHEL node upgrade
- Closed
- is cloned by
-
OCPBUGS-41723 Suggest adding a restart of openvswitch service after updating the openvswitch package during RHEL node upgrade
- Closed
- links to
-
RHBA-2024:7184 OpenShift Container Platform 4.14.z bug fix update