-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.11
-
Important
-
No
-
Sprint 242
-
1
-
False
-
-
Upgrading the package openvswitch was causing the network outage. To fix the issue, openvswitch must be remove and installed at the same time with a newer version.
-
Release Note Not Required
-
In Progress
Description of problem:
Sometimes we met RHEL worker lost connection when upgrade, actually it happened on remove openvswitch rpm package as below. but it not always happen see logs, there are 2 RHEL worker upgraded, one (ip 10.0.176.221) is success, but the other (10.0.177.38) failed. https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp4-rhel-scaleup-runner/26003/consoleFull TASK [openshift_node : Find all downloaded rpms] ******************************* ok: [10.0.177.38] => {"changed": false, "examined": 2, "files": [{"atime": 1694159818.368708, "ctime": 1694159818.4047084, "dev": 64515, "gid": 0, "gr_name": "root", "inode": 168357075, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "mode": "0644", "mtime": 1694159818.3677077, "nlink": 1, "path": "/tmp/openshift-ansible-packages/openvswitch2.17-2.17.0-106.el8fdp.x86_64.rpm", "pw_name": "root", "rgrp": true, "roth": true, "rusr": true, "size": 6866096, "uid": 0, "wgrp": false, "woth": false, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}, {"atime": 1694159822.372777, "ctime": 1694159822.377777, "dev": 64515, "gid": 0, "gr_name": "root", "inode": 168357076, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "mode": "0644", "mtime": 1694159822.372777, "nlink": 1, "path": "/tmp/openshift-ansible-packages/policycoreutils-python-utils-2.9-24.el8.noarch.rpm", "pw_name": "root", "rgrp": true, "roth": true, "rusr": true, "size": 259768, "uid": 0, "wgrp": false, "woth": false, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}], "matched": 2, "msg": "All paths examined", "skipped_paths": {}} TASK [openshift_node : Setting list of rpms] *********************************** ok: [10.0.177.38] => {"ansible_facts": {"rpm_list": ["/tmp/openshift-ansible-packages/openvswitch2.17-2.17.0-106.el8fdp.x86_64.rpm", "/tmp/openshift-ansible-packages/policycoreutils-python-utils-2.9-24.el8.noarch.rpm"]}, "changed": false} TASK [openshift_node : Remove known conflicts] ********************************* changed: [10.0.177.38] => (item=openvswitch) => {"ansible_loop_var": "item", "changed": true, "item": "openvswitch", "msg": "", "rc": 0, "results": ["Removed: openvswitch-selinux-extra-policy-1.0-31.el8fdp.noarch", "Removed: openvswitch2.17-2.17.0-106.el8fdp.x86_64"]} TASK [openshift_node : Install downloaded packages] **************************** fatal: [10.0.177.38]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'rhel-8-for-x86_64-appstream-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried", "rc": 1, "results": []}
Version-Release number of selected component (if applicable):
4.11
How reproducible:
not always
Steps to Reproduce:
1. Upgrade cluster with RHEL8 from 4.11.48-x86_64 to 4.11.0-0.nightly-2023-09-05-134659 2. 3.
Actual results:
Expected results:
Additional info:
since the worker cannot be accessed, so no logs can be collected for now. Trying to find other ways if this happen next time. eg not sure it can be rollback if reboot the worker
- blocks
-
OCPBUGS-19871 [RHEL]Host lost connection during upgrade for RHEL worker
- Closed
- is cloned by
-
OCPBUGS-19871 [RHEL]Host lost connection during upgrade for RHEL worker
- Closed
- links to
-
RHEA-2023:7198 rpm