Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11936

afterburn-hostname service failed when OVN-Kubernetes networkType

XMLWordPrintable

    • -
    • Low
    • No
    • ShiftStack Sprint 249, ShiftStack Sprint 250, ShiftStack Sprint 251, ShiftStack Sprint 252, ShiftStack Sprint 253
    • 5
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the Afterburn service used to set the hostname on nodes timed out while waiting for the metadata service to become available, causing issues when deploying with OVN-Kubernetes. Now, the Afterburn service waits longer for the metadata service to become available, resolving these timeouts. (link:https://issues.redhat.com/browse/OCPBUGS-11936[*OCPBUGS-11936*])
      Show
      * Previously, the Afterburn service used to set the hostname on nodes timed out while waiting for the metadata service to become available, causing issues when deploying with OVN-Kubernetes. Now, the Afterburn service waits longer for the metadata service to become available, resolving these timeouts. (link: https://issues.redhat.com/browse/OCPBUGS-11936 [* OCPBUGS-11936 *])
    • Bug Fix
    • Done

      Description of problem:

      While deploying a cluster with OVNKubnernetes or applying a cloud-provider-config change, all OCP nodes got a failing unit on them:

      $  oc debug -q node/ostest-h9vbm-master-0 -- chroot  /host sudo systemctl list-units --failed
        UNIT                       LOAD   ACTIVE SUB    DESCRIPTION
      ● afterburn-hostname.service loaded failed failed Afterburn HostnameLOAD   = Reflects whether the unit definition was properly loaded.
      ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
      SUB    = The low-level unit activation state, values depend on unit type.
      1 loaded units listed.
      
      $ oc debug -q node/ostest-h9vbm-master-0 -- chroot  /host sudo systemctl status afterburn-hostname
      × afterburn-hostname.service - Afterburn Hostname
           Loaded: loaded (/etc/systemd/system/afterburn-hostname.service; enabled; preset: disabled)
           Active: failed (Result: exit-code) since Tue 2023-04-18 11:48:35 UTC; 2h 26min ago
         Main PID: 1309 (code=exited, status=123)
              CPU: 148msApr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     1: maximum number of retries (10) reached
      Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     2: failed to fetch
      Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     3: error sending request for url (http://169.254.169.254/latest/meta-data/hostname): error trying to connect: tcp connect error: Network is unreachable (os error 101)
      Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     4: error trying to connect: tcp connect error: Network is unreachable (os error 101)
      Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     5: tcp connect error: Network is unreachable (os error 101)
      Apr 18 11:48:35 ostest-h9vbm-master-0 openstack-afterburn-hostname[1314]:     6: Network is unreachable (os error 101)
      Apr 18 11:48:35 ostest-h9vbm-master-0 hostnamectl[2494]: Too few arguments.
      Apr 18 11:48:35 ostest-h9vbm-master-0 systemd[1]: afterburn-hostname.service: Main process exited, code=exited, status=123/n/a
      Apr 18 11:48:35 ostest-h9vbm-master-0 systemd[1]: afterburn-hostname.service: Failed with result 'exit-code'.
      Apr 18 11:48:35 ostest-h9vbm-master-0 systemd[1]: Failed to start Afterburn Hostname.
      
      
      $ oc debug -q node/ostest-h9vbm-worker-0-fkxdr -- chroot  /host sudo systemctl list-units --failed
        UNIT                       LOAD   ACTIVE SUB    DESCRIPTION
      ● afterburn-hostname.service loaded failed failed Afterburn HostnameLOAD   = Reflects whether the unit definition was properly loaded.
      ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
      SUB    = The low-level unit activation state, values depend on unit type.
      1 loaded units listed.
      

      Once the installation of the config change is done, restarting the service resolves the issue:

      $ oc debug -q node/ostest-h9vbm-worker-0-fkxdr -- chroot  /host sudo systemctl restart afterburn-hostname
      
      $ oc debug -q node/ostest-h9vbm-worker-0-fkxdr -- chroot  /host sudo systemctl status afterburn-hostname
      ○ afterburn-hostname.service - Afterburn Hostname
           Loaded: loaded (/etc/systemd/system/afterburn-hostname.service; enabled; preset: disabled)
           Active: inactive (dead) since Tue 2023-04-18 14:14:40 UTC; 9s ago
          Process: 171875 ExecStart=/usr/local/bin/openstack-afterburn-hostname (code=exited, status=0/SUCCESS)
         Main PID: 171875 (code=exited, status=0/SUCCESS)
              CPU: 119msApr 18 14:14:32 ostest-h9vbm-worker-0-fkxdr systemd[1]: Starting Afterburn Hostname...
      Apr 18 14:14:39 ostest-h9vbm-worker-0-fkxdr openstack-afterburn-hostname[171876]: Apr 18 14:14:39.521 WARN failed to locate config-drive, using the metadata service API instead
      Apr 18 14:14:39 ostest-h9vbm-worker-0-fkxdr openstack-afterburn-hostname[171876]: Apr 18 14:14:39.583 INFO Fetching http://169.254.169.254/latest/meta-data/hostname: Attempt #1
      Apr 18 14:14:40 ostest-h9vbm-worker-0-fkxdr openstack-afterburn-hostname[171876]: Apr 18 14:14:40.237 INFO Fetch successful
      Apr 18 14:14:40 ostest-h9vbm-worker-0-fkxdr openstack-afterburn-hostname[171876]: Apr 18 14:14:40.237 INFO wrote hostname ostest-h9vbm-worker-0-fkxdr to /dev/stdout
      Apr 18 14:14:40 ostest-h9vbm-worker-0-fkxdr systemd[1]: afterburn-hostname.service: Deactivated successfully.
      Apr 18 14:14:40 ostest-h9vbm-worker-0-fkxdr systemd[1]: Finished Afterburn Hostname.
      error: non-zero exit code from debug container
      [stack@undercloud-0 ~]$ oc debug -q node/ostest-h9vbm-master-0 -- chroot  /host sudo systemctl status afterburn-hostname
      × afterburn-hostname.service - Afterburn Hostname
           Loaded: loaded (/etc/systemd/system/afterburn-hostname.service; enabled; preset: disabled)
           Active: failed (Result: exit-code) since Tue 2023-04-18 11:48:35 UTC; 2h 26min ago
         Main PID: 1309 (code=exited, status=123)
              CPU: 148ms

      Version-Release number of selected component (if applicable):

      Observed on 4.13.0-0.nightly-2023-04-13-171034 and 4.12.13

      How reproducible:

      Always

      Additional info:

      More retries or expanding them in time can help resolve this. It seems that in OVN-K the network is taking time to get ready and therefore the retries are timed out with the current configuration before the network is ready.
      
      Must-gather link provided on private comment.

            sfinucan@redhat.com Stephen Finucane
            rlobillo Ramón Lobillo
            Yaakov Khodorkovski Yaakov Khodorkovski
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: