Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18703

[RHEL]Host lost connection during upgrade for RHEL worker

XMLWordPrintable

    • Important
    • No
    • Sprint 242
    • 1
    • False
    • Hide

      None

      Show
      None
    • Upgrading the package openvswitch was causing the network outage. To fix the issue, openvswitch must be remove and installed at the same time with a newer version.
    • Release Note Not Required
    • In Progress

      Description of problem:

      Sometimes we met RHEL worker lost connection when upgrade, actually it happened on remove openvswitch rpm package as below.  but it not always happen
      
      see logs,  there are 2 RHEL worker upgraded,  one (ip 10.0.176.221) is success, but the other (10.0.177.38) failed.
      
       https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp4-rhel-scaleup-runner/26003/consoleFull
      
      
      TASK [openshift_node : Find all downloaded rpms] *******************************
      ok: [10.0.177.38] => {"changed": false, "examined": 2, "files": [{"atime": 1694159818.368708, "ctime": 1694159818.4047084, "dev": 64515, "gid": 0, "gr_name": "root", "inode": 168357075, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "mode": "0644", "mtime": 1694159818.3677077, "nlink": 1, "path": "/tmp/openshift-ansible-packages/openvswitch2.17-2.17.0-106.el8fdp.x86_64.rpm", "pw_name": "root", "rgrp": true, "roth": true, "rusr": true, "size": 6866096, "uid": 0, "wgrp": false, "woth": false, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}, {"atime": 1694159822.372777, "ctime": 1694159822.377777, "dev": 64515, "gid": 0, "gr_name": "root", "inode": 168357076, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "mode": "0644", "mtime": 1694159822.372777, "nlink": 1, "path": "/tmp/openshift-ansible-packages/policycoreutils-python-utils-2.9-24.el8.noarch.rpm", "pw_name": "root", "rgrp": true, "roth": true, "rusr": true, "size": 259768, "uid": 0, "wgrp": false, "woth": false, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}], "matched": 2, "msg": "All paths examined", "skipped_paths": {}}
      
      TASK [openshift_node : Setting list of rpms] ***********************************
      ok: [10.0.177.38] => {"ansible_facts": {"rpm_list": ["/tmp/openshift-ansible-packages/openvswitch2.17-2.17.0-106.el8fdp.x86_64.rpm", "/tmp/openshift-ansible-packages/policycoreutils-python-utils-2.9-24.el8.noarch.rpm"]}, "changed": false}
      
      TASK [openshift_node : Remove known conflicts] *********************************
      changed: [10.0.177.38] => (item=openvswitch) => {"ansible_loop_var": "item", "changed": true, "item": "openvswitch", "msg": "", "rc": 0, "results": ["Removed: openvswitch-selinux-extra-policy-1.0-31.el8fdp.noarch", "Removed: openvswitch2.17-2.17.0-106.el8fdp.x86_64"]}
      
      TASK [openshift_node : Install downloaded packages] ****************************
      fatal: [10.0.177.38]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'rhel-8-for-x86_64-appstream-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried", "rc": 1, "results": []}

      Version-Release number of selected component (if applicable):

      4.11

      How reproducible:

      not always

      Steps to Reproduce:

      1. Upgrade cluster with RHEL8 from 4.11.48-x86_64 to 4.11.0-0.nightly-2023-09-05-134659
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      since the worker cannot be accessed, so no logs can be collected for now. Trying to find other ways if this happen next time.  eg not sure it can be rollback if reboot the worker

            rh-ee-bbarbach Brent Barbachem
            zzhao1@redhat.com Zhanqi Zhao
            Gaoyun Pei Gaoyun Pei
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: