Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-34842

Suggest adding a restart of openvswitch service after updating the openvswitch package during RHEL node upgrade

XMLWordPrintable

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the `openvswitch` service used older cluster configurations after a cluster upgrade and this caused the `openvswitch` service to stop. With this release, the `openvswitch` service is now restarted after a cluster upgrade so that the service uses the newer cluster configurations. (link:https://issues.redhat.com/browse/OCPBUGS-34842[*OCPBUGS-34842*])
      Show
      * Previously, the `openvswitch` service used older cluster configurations after a cluster upgrade and this caused the `openvswitch` service to stop. With this release, the `openvswitch` service is now restarted after a cluster upgrade so that the service uses the newer cluster configurations. (link: https://issues.redhat.com/browse/OCPBUGS-34842 [* OCPBUGS-34842 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-22366. The following is the description of the original issue:

      Description of problem:

      In QE's CI testing, we found the openvswitch service was stopped after updating the openvswitch package during an upgrade process, which cause the RHEL worker losing the Internet connection and the whole upgrade to be failed. 
      

      Version-Release number of selected component (if applicable):

      release-4.13
      

      How reproducible:

      Always in Prow CI jobs
      

      Steps to Reproduce:

      
      1. Set up a cluster on azure, 3 coreos masters + 3 rhel workers
      
      2. Upgrade the cluster
      
      3. Master are upgrade successfully
      
      4. Start to upgrade the RHEL workers using upgrade playbook
      
      5. The playbook is upgrading one of the RHEL worker:
      TASK [openshift_node : Install openshift packages] *****************************
      ....
      <10.0.1.8> (0, b'\n{"ansible_job_id": "817973574852.42491", "erased": "/root/.ansible_async/817973574852.42491", "invocation": {"module_args": {"jid": "817973574852.42491", "mode": "cleanup", "_async_dir": "/root/.ansible_async"}}}\n', b'')
      ASYNC OK on 10.0.1.8: jid=817973574852.42491
      changed: [10.0.1.8] => {
          "ansible_job_id": "817973574852.42491",
          "attempts": 1,
          "changed": true,
          "finished": 1,
          "invocation": {
              "module_args": {
                  "allow_downgrade": false,
                  "allowerasing": true,
                  "autoremove": false,
                  "bugfix": false,
                  "cacheonly": false,
                  "conf_file": null,
                  "disable_excludes": null,
                  "disable_gpg_check": true,
                  "disable_plugin": [],
                  "disablerepo": [],
                  "download_dir": null,
                  "download_only": false,
                  "enable_plugin": [],
                  "enablerepo": [],
                  "exclude": [],
                  "install_repoquery": true,
                  "install_weak_deps": true,
                  "installroot": "/",
                  "list": null,
                  "lock_timeout": 30,
                  "name": [
                      "conmon",
                      "cri-o-1.26.4",
                      "cri-tools",
                      "openshift-clients-4.13*",
                      "openshift-hyperkube-4.13*",
                      "podman",
                      "runc",
                      "kernel",
                      "systemd",
                      "selinux-policy-targeted",
                      "setools-console",
                      "dracut-network",
                      "passwd",
                      "openssh-server",
                      "openssh-clients",
                      "skopeo",
                      "containernetworking-plugins",
                      "nfs-utils",
                      "NetworkManager",
                      "NetworkManager-ovs",
                      "dnsmasq",
                      "lvm2",
                      "iscsi-initiator-utils",
                      "sg3_utils",
                      "device-mapper-multipath",
                      "xfsprogs",
                      "e2fsprogs",
                      "mdadm",
                      "cryptsetup",
                      "chrony",
                      "logrotate",
                      "sssd",
                      "shadow-utils",
                      "sudo",
                      "coreutils",
                      "less",
                      "tar",
                      "xz",
                      "gzip",
                      "bzip2",
                      "rsync",
                      "tmux",
                      "nmap-ncat",
                      "net-tools",
                      "bind-utils",
                      "strace",
                      "bash-completion",
                      "vim-minimal",
                      "nano",
                      "authconfig",
                      "iptables-services",
                      "cifs-utils",
                      "jq",
                      "libseccomp",
                      "openvswitch3.1",
                      "policycoreutils-python-utils",
                      "microcode_ctl",
                      "irqbalance",
                      "biosdevname",
                      "glusterfs-fuse"
                  ],
                  "nobest": false,
                  "releasever": null,
                  "security": false,
                  "skip_broken": false,
                  "sslverify": true,
                  "state": "latest",
                  "update_cache": false,
                  "update_only": false,
                  "validate_certs": true
              }
          },
          "msg": "",
          "rc": 0,
          "results": [
              "Installed: cri-o-1.26.4-4.1.rhaos4.13.git92b763a.el8.x86_64",
              "Installed: runc-4:1.1.9-1.1.rhaos4.13.el8.x86_64",
              "Installed: cri-tools-1.26.0-2.1.el8.x86_64",
              "Installed: openshift-clients-4.13.0-202310162106.p0.g717d4a5.assembly.stream.el8.x86_64",
              "Installed: conmon-3:2.1.7-2.1.rhaos4.13.el8.x86_64",
              "Installed: openshift-hyperkube-4.13.0-202310210425.p0.g636f2be.assembly.stream.el8.x86_64",
              "Installed: openvswitch3.1-3.1.0-61.el8fdp.x86_64",
              "Installed: skopeo-2:1.11.2-2.1.rhaos4.13.el8.x86_64",
              "Removed: cri-o-1.25.4-4.1.rhaos4.12.gitb9319a2.el8.x86_64",
              "Removed: cri-tools-1.25.0-2.1.el8.x86_64",
              "Removed: runc-3:1.1.6-4.1.rhaos4.12.el8.x86_64",
              "Removed: openshift-clients-4.12.0-202310180726.p0.ga55beda.assembly.stream.el8.x86_64",
              "Removed: openshift-hyperkube-4.12.0-202310210144.p0.g31e0558.assembly.stream.el8.x86_64",
              "Removed: skopeo-2:1.11.2-0.2.module+el8.8.0+19993+47c8ef84.x86_64",
              "Removed: openvswitch2.17-2.17.0-123.el8fdp.x86_64",
              "Removed: conmon-3:2.1.6-1.module+el8.8.0+19993+47c8ef84.x86_64"
          ],
          "results_file": "/root/.ansible_async/817973574852.42491",
          "started": 1,
          "stderr": "",
          "stderr_lines": [],
          "stdout": "",
          "stdout_lines": []
      }
      ...
      TASK [openshift_node : Pull MCD image] *****************************************
      task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/apply_machine_config.yml:60
      Using module file /opt/python-env/ansible-core/lib64/python3.8/site-packages/ansible/modules/command.py
      Pipelining is enabled.
      <10.0.1.8> ESTABLISH SSH CONNECTION FOR USER: cloud-user
      <10.0.1.8> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="cloud-user"' -o ConnectTimeout=30 -o IdentityFile=/var/run/secrets/ci.openshift.io/cluster-profile/ssh-privatekey -o StrictHostKeyChecking=no -o 'ProxyCommand=ssh -o IdentityFile=/var/run/secrets/ci.openshift.io/cluster-profile/ssh-privatekey -o ConnectTimeout=30 -o ConnectionAttempts=100 -o StrictHostKeyChecking=no -W %h:%p -q core@4.151.214.43' -o 'ControlPath="/alabama/.ansible/cp/%h-%r"' 10.0.1.8 '/bin/sh -c '"'"'sudo -H -S -n  -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-dboorgbfowovrzyggbmdgegpwmrvoknq ; http_proxy='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' https_proxy='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' no_proxy='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' /usr/libexec/platform-python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
      Escalation succeeded
      <10.0.1.8> (1, b'\n{"changed": true, "stdout": "", "stderr": "Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441...\\nError: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441: reading manifest sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized", "rc": 125, "cmd": ["podman", "pull", "--tls-verify=False", "--authfile", "/var/lib/kubelet/config.json", "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441"], "start": "2023-10-24 13:41:51.676499", "end": "2023-10-24 13:43:08.774173", "delta": "0:01:17.097674", "failed": true, "msg": "non-zero return code", "invocation": {"module_args": {"_raw_params": "podman pull --tls-verify=False --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441", "_uses_shell": false, "warn": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}}\n', b'')
      <10.0.1.8> Failed to connect to the host via ssh: 
      FAILED - RETRYING: [10.0.1.8]: Pull MCD image (12 retries left).Result was: {
          "attempts": 1,
          "changed": true,
          "cmd": [
              "podman",
              "pull",
              "--tls-verify=False",
              "--authfile",
              "/var/lib/kubelet/config.json",
              "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441"
          ],
          "delta": "0:01:17.097674",
          "end": "2023-10-24 13:43:08.774173",
          "invocation": {
              "module_args": {
                  "_raw_params": "podman pull --tls-verify=False --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441",
                  "_uses_shell": false,
                  "argv": null,
                  "chdir": null,
                  "creates": null,
                  "executable": null,
                  "removes": null,
                  "stdin": null,
                  "stdin_add_newline": true,
                  "strip_empty_ends": true,
                  "warn": false
              }
          },
          "msg": "non-zero return code",
          "rc": 125,
          "retries": 13,
          "start": "2023-10-24 13:41:51.676499",
          "stderr": "Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441...\nError: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441: reading manifest sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized",
          "stderr_lines": [
              "Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441...",
              "Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441: reading manifest sha256:0c452c53747d7ac6051cd29cf1d09372d57d8091d73be2784fd7e8597a9cb441 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized"
          ],
          "stdout": "",
          "stdout_lines": []
      }
      ...
      
      6. On this moment, the updating rhel worker lost internet connectivity, so failed to pull MCD image
      
      7. SSH into the RHEL worker, run curl command against google.com, hang there, either, so confirmed the internet connectivity is lost.
      
      8. Checking openvswitch services.
      [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ journalctl -u ovs-vswitchd.service
      ...
      Oct 24 13:40:58 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00316|connmgr|INFO|br-int<->unix#15: 58 flow_mods 10 s ago (4 adds, 54 deletes)
      Oct 24 13:41:01 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00317|connmgr|INFO|br-ex<->unix#1080: 2 flow_mods in the last 0 s (2 adds)
      Oct 24 13:41:02 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00318|bridge|INFO|bridge br-int: deleted interface daa1890fc840ffa on port 4
      Oct 24 13:41:05 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00319|connmgr|INFO|br-int<->unix#15: 29 flow_mods 3 s ago (2 adds, 27 deletes)
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopping Open vSwitch Forwarding Unit...
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00320|bridge|INFO|bridge br-ex: deleted interface eth0 on port 1
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00321|bridge|INFO|bridge br-ex: deleted interface br-ex on port 65534
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00322|bridge|INFO|bridge br-ex: deleted interface patch-br-ex_ci-op-5jms8f2c-fe944-984d5-rhel-2-to-br-int on port 2
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00323|ofproto_dpif_rid|ERR|recirc_id 3714 left allocated when ofproto (br-ex) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00324|ofproto_dpif_rid|ERR|recirc_id 23 left allocated when ofproto (br-ex) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00325|ofproto_dpif_rid|ERR|recirc_id 3719 left allocated when ofproto (br-ex) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00326|bridge|INFO|bridge br-int: deleted interface ovn-3e037a-0 on port 2
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00327|bridge|INFO|bridge br-int: deleted interface ovn-af77fa-0 on port 3
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00328|bridge|INFO|bridge br-int: deleted interface ovn-k8s-mp0 on port 9
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00329|bridge|INFO|bridge br-int: deleted interface ovn-aee0f0-0 on port 8
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00330|bridge|INFO|bridge br-int: deleted interface ovn-38d58c-0 on port 7
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00331|bridge|INFO|bridge br-int: deleted interface ovn-d3dd7f-0 on port 1
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00332|bridge|INFO|bridge br-int: deleted interface patch-br-int-to-br-ex_ci-op-5jms8f2c-fe944-984d5-rhel-2 on port >
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00333|bridge|INFO|bridge br-int: deleted interface br-int on port 65534
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00334|bridge|INFO|bridge br-int: deleted interface bdc09b6ea877236 on port 6
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00335|bridge|INFO|bridge br-int: deleted interface 73057cfb3038ad7 on port 12
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00336|ofproto_dpif_rid|ERR|recirc_id 55 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00337|ofproto_dpif_rid|ERR|recirc_id 3717 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00338|ofproto_dpif_rid|ERR|recirc_id 3745 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00339|ofproto_dpif_rid|ERR|recirc_id 3741 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00340|ofproto_dpif_rid|ERR|recirc_id 3693 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00341|ofproto_dpif_rid|ERR|recirc_id 3694 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00342|ofproto_dpif_rid|ERR|recirc_id 3698 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00343|ofproto_dpif_rid|ERR|recirc_id 25 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00344|ofproto_dpif_rid|ERR|recirc_id 3697 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00345|ofproto_dpif_rid|ERR|recirc_id 3718 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00346|ofproto_dpif_rid|ERR|recirc_id 3716 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00347|ofproto_dpif_rid|ERR|recirc_id 3715 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00348|ofproto_dpif_rid|ERR|recirc_id 3740 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00349|ofproto_dpif_rid|ERR|recirc_id 3696 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00350|ofproto_dpif_rid|ERR|recirc_id 3744 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00351|ofproto_dpif_rid|ERR|recirc_id 31 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-ctl[44658]: Exiting ovs-vswitchd (1159) [  OK  ]
      Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: ovs-vswitchd.service: Succeeded.
      Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopped Open vSwitch Forwarding Unit.
      Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: ovs-vswitchd.service: Consumed 21.766s CPU time
      [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ date
      Tue Oct 24 13:45:27 UTC 2023
      [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ tail -f /var/log/dnf.log 
      2023-10-24T13:41:21+0000 DEBUG Upgraded: cri-tools-1.26.0-2.1.el8.x86_64
      2023-10-24T13:41:21+0000 DEBUG Upgraded: openshift-clients-4.13.0-202310162106.p0.g717d4a5.assembly.stream.el8.x86_64
      2023-10-24T13:41:21+0000 DEBUG Upgraded: openshift-hyperkube-4.13.0-202310210425.p0.g636f2be.assembly.stream.el8.x86_64
      2023-10-24T13:41:21+0000 DEBUG Upgraded: runc-4:1.1.9-1.1.rhaos4.13.el8.x86_64
      2023-10-24T13:41:21+0000 DEBUG Upgraded: skopeo-2:1.11.2-2.1.rhaos4.13.el8.x86_64
      2023-10-24T13:41:21+0000 DEBUG Installed: openvswitch3.1-3.1.0-61.el8fdp.x86_64
      2023-10-24T13:41:21+0000 DEBUG Skipped: openvswitch2.17-2.17.0-62.el8fdp.x86_64
      2023-10-24T13:41:21+0000 DEBUG Skipped: openvswitch2.17-2.17.0-62.el8fdp.x86_64
      2023-10-24T13:41:21+0000 DEBUG Skipped: openvswitch2.17-2.17.0-110.el8fdp.x86_64
      2023-10-24T13:41:21+0000 DEBUG Removed: openvswitch2.17-2.17.0-123.el8fdp.x86_64
      [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ systemctl status openvswitch.service
      ● openvswitch.service - Open vSwitch
         Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
         Active: inactive (dead) since Tue 2023-10-24 13:41:18 UTC; 15min ago
       Main PID: 1196 (code=exited, status=0/SUCCESS)
            CPU: 1ms
      
      Oct 24 13:06:10 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Starting Open vSwitch...
      Oct 24 13:06:10 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Started Open vSwitch.
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopping Open vSwitch...
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: openvswitch.service: Succeeded.
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopped Open vSwitch.
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: openvswitch.service: Consumed 1ms CPU time
      [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ systemctl status ovs-vswitchd
      ● ovs-vswitchd.service - Open vSwitch Forwarding Unit
         Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
        Drop-In: /etc/systemd/system/ovs-vswitchd.service.d
                 └─10-ovs-vswitchd-restart.conf
         Active: inactive (dead) since Tue 2023-10-24 13:41:19 UTC; 16min ago
       Main PID: 1159 (code=exited, status=0/SUCCESS)
            CPU: 21.766s
      
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00346|ofproto_dpif_rid|ERR|recirc_id 3716 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00347|ofproto_dpif_rid|ERR|recirc_id 3715 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00348|ofproto_dpif_rid|ERR|recirc_id 3740 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00349|ofproto_dpif_rid|ERR|recirc_id 3696 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00350|ofproto_dpif_rid|ERR|recirc_id 3744 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[1159]: ovs|00351|ofproto_dpif_rid|ERR|recirc_id 31 left allocated when ofproto (br-int) is destructed
      Oct 24 13:41:18 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-ctl[44658]: Exiting ovs-vswitchd (1159) [  OK  ]
      Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: ovs-vswitchd.service: Succeeded.
      Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Stopped Open vSwitch Forwarding Unit.
      Oct 24 13:41:19 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: ovs-vswitchd.service: Consumed 21.766s CPU time
      
      9. From the above log, the openvswitch service get stopped by some unknown reason, from the dnf log and the timestamp (Oct 24 13:41), seem like it is caused by openvswitch package update.
      
      10. Manually start openvswitch service on the rhel worker
      [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ sudo systemctl start openvswitch.service
      [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ systemctl status openvswitch.service
      ● openvswitch.service - Open vSwitch
         Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
         Active: active (exited) since Tue 2023-10-24 14:01:44 UTC; 5s ago
        Process: 60646 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
       Main PID: 60646 (code=exited, status=0/SUCCESS)
            CPU: 1ms
      
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Starting Open vSwitch...
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Started Open vSwitch.
      [cloud-user@ci-op-5jms8f2c-fe944-984d5-rhel-2 ~]$ systemctl status ovs-vswitchd
      ● ovs-vswitchd.service - Open vSwitch Forwarding Unit
         Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
        Drop-In: /etc/systemd/system/ovs-vswitchd.service.d
                 └─10-ovs-vswitchd-restart.conf
         Active: active (running) since Tue 2023-10-24 14:01:44 UTC; 12s ago
       Main PID: 60630 (ovs-vswitchd)
          Tasks: 8 (limit: 100010)
         Memory: 142.7M
            CPU: 192ms
         CGroup: /system.slice/ovs-vswitchd.service
                 └─60630 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/>
      
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00007|stream_ssl|ERR|SSL_use_certificate_file: error:02001002:system library:fopen:No such file or directory
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00008|stream_ssl|ERR|SSL_use_PrivateKey_file: error:20074002:BIO routines:file_ctrl:system lib
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00009|stream_ssl|ERR|failed to load client certificates from /ovn-ca/ca-bundle.crt: error:140AD002:SSL routines:SS>
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-ctl[60588]: Starting ovs-vswitchd [  OK  ]
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-ctl[60588]: Enabling remote OVSDB managers [  OK  ]
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vsctl[60645]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait add Open_vSwitch . external-ids hostname=ci-op-5jms8f2c-fe944-984d5-rh>
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 systemd[1]: Started Open vSwitch Forwarding Unit.
      Oct 24 14:01:44 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00001|ofproto_dpif_xlate(handler2)|WARN|Invalid Geneve tunnel metadata on bridge br-int while processing tcp,in_po>
      Oct 24 14:01:54 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00055|memory|INFO|174144 kB peak resident set size after 10.1 seconds
      Oct 24 14:01:54 ci-op-5jms8f2c-fe944-984d5-rhel-2 ovs-vswitchd[60630]: ovs|00056|memory|INFO|handlers:4 idl-cells-Open_vSwitch:815 ports:13 revalidators:2 rules:9 udpif keys:8
      
      11. The hang step (Pull MCD image) in the upgrade playbook succeeded, that means the rhel worker get internet connectivity back.
      

      Actual results:

      
      

      Expected results:

      So it would be better to have a restart of the openvswitch service before pulling the MCD image(https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/tasks/upgrade.yml#L32) to guarantee the Internet connectivity if there's `openvswitch` package update in the install.yml(https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/tasks/upgrade.yml#L29C6-L29C6)
      
      

      Additional info:

      
      

              rh-ee-bbarbach Brent Barbachem
              openshift-crt-jira-prow OpenShift Prow Bot
              Gaoyun Pei Gaoyun Pei
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: