Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19871

[RHEL]Host lost connection during upgrade for RHEL worker

XMLWordPrintable

    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-18703. The following is the description of the original issue:

      Description of problem:

      Sometimes we met RHEL worker lost connection when upgrade, actually it happened on remove openvswitch rpm package as below.  but it not always happen
      
      see logs,  there are 2 RHEL worker upgraded,  one (ip 10.0.176.221) is success, but the other (10.0.177.38) failed.
      
       https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp4-rhel-scaleup-runner/26003/consoleFull
      
      
      TASK [openshift_node : Find all downloaded rpms] *******************************
      ok: [10.0.177.38] => {"changed": false, "examined": 2, "files": [{"atime": 1694159818.368708, "ctime": 1694159818.4047084, "dev": 64515, "gid": 0, "gr_name": "root", "inode": 168357075, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "mode": "0644", "mtime": 1694159818.3677077, "nlink": 1, "path": "/tmp/openshift-ansible-packages/openvswitch2.17-2.17.0-106.el8fdp.x86_64.rpm", "pw_name": "root", "rgrp": true, "roth": true, "rusr": true, "size": 6866096, "uid": 0, "wgrp": false, "woth": false, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}, {"atime": 1694159822.372777, "ctime": 1694159822.377777, "dev": 64515, "gid": 0, "gr_name": "root", "inode": 168357076, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": true, "issock": false, "isuid": false, "mode": "0644", "mtime": 1694159822.372777, "nlink": 1, "path": "/tmp/openshift-ansible-packages/policycoreutils-python-utils-2.9-24.el8.noarch.rpm", "pw_name": "root", "rgrp": true, "roth": true, "rusr": true, "size": 259768, "uid": 0, "wgrp": false, "woth": false, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}], "matched": 2, "msg": "All paths examined", "skipped_paths": {}}
      
      TASK [openshift_node : Setting list of rpms] ***********************************
      ok: [10.0.177.38] => {"ansible_facts": {"rpm_list": ["/tmp/openshift-ansible-packages/openvswitch2.17-2.17.0-106.el8fdp.x86_64.rpm", "/tmp/openshift-ansible-packages/policycoreutils-python-utils-2.9-24.el8.noarch.rpm"]}, "changed": false}
      
      TASK [openshift_node : Remove known conflicts] *********************************
      changed: [10.0.177.38] => (item=openvswitch) => {"ansible_loop_var": "item", "changed": true, "item": "openvswitch", "msg": "", "rc": 0, "results": ["Removed: openvswitch-selinux-extra-policy-1.0-31.el8fdp.noarch", "Removed: openvswitch2.17-2.17.0-106.el8fdp.x86_64"]}
      
      TASK [openshift_node : Install downloaded packages] ****************************
      fatal: [10.0.177.38]: FAILED! => {"changed": false, "msg": "Failed to download metadata for repo 'rhel-8-for-x86_64-appstream-rpms': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried", "rc": 1, "results": []}

      Version-Release number of selected component (if applicable):

      4.11

      How reproducible:

      not always

      Steps to Reproduce:

      1. Upgrade cluster with RHEL8 from 4.11.48-x86_64 to 4.11.0-0.nightly-2023-09-05-134659
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      since the worker cannot be accessed, so no logs can be collected for now. Trying to find other ways if this happen next time.  eg not sure it can be rollback if reboot the worker

            Unassigned Unassigned
            openshift-crt-jira-prow OpenShift Prow Bot
            Gaoyun Pei Gaoyun Pei
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: