Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4890

[Baremetal Workers][OSP 16.2] Worker machines stuck in the 'Provisioning' phase when using OpenShiftSDN

    • +
    • Critical
    • None
    • ShiftStack Sprint 229, ShiftStack Sprint 230
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Since https://bugzilla.redhat.com/show_bug.cgi?id=2083120 has been fixed, we recently added D/S CI periodic jobs installing OCP with Baremetal Workers on top of OSP 16.2.
      
      In the CI run, we observed that OCP with BMWs installation passes when using the OVNKubernetes network type but fails with OpenShiftSDN.
      In the case of OpenShiftSDN, the worker machines are stuck in the 'Provisioning' phase, and the OSP instances are not created.
      
      Checking the machine-controller and osp controller logs:
      ```
      W1201 09:46:59.887172       1 controller.go:374] ostest-hsxw9-worker-0-rvh92: failed to create machine: error creating Openstack instance: error creating Openstack instance: Post "https://10.46.44.75:13774/v2.1/servers": EOF
      
      E1201 09:46:59.887307       1 controller.go:326]  "msg"="Reconciler error" "error"="error creating Openstack instance: error creating Openstack instance: Post \"https://10.46.44.75:13774/v2.1/servers\": EOF" "controller"="machine-controller" "name"="ostest-hsxw9-worker-0-rvh92" "namespace"="openshift-machine-api" "object"={"name":"ostest-hsxw9-worker-0-rvh92","namespace":"openshift-machine-api"} "reconcileID"="f5db6738-b949-4909-b198-990ba309f3fe" 
      ```
      But curling openstack api from the pod, and the connectivity is fine.
      
      Michal Dulko debuged the env with the repreduced issue and found that:
      * The setup suffers some MTU mismatch in 16.2 but not in 16.1.6. What helps is to reduce MTU on the VM interface from 1500 to 1400 and requests start to work fine again.
      
      * Setting MTU to 1400 on a master VM (using 'ip link set dev ens3 mtu 1400') helped, and the Baremetal workers came up.
      
      * Any OpenStack API request long enough would be problematic.

      Version-Release number of selected component (if applicable):

      OCP 4.10 (and above) with OpenShiftSDN and Baremetal Workers on top of OSP 16.2 (RHOS-16.2-RHEL-8-20221124.n.1).
      
      Note: The same an exact env and configuration but with OVNKubernetes works fine, and the issue is not reproduce

      How reproducible:

      Always

      Steps to Reproduce:

      Run the openshift installer with OpenShiftSDN network type and Baremetal Workers on top of OSP 16.2.
      

      Actual results:

      Worker machines stuck in the 'Provisioning' phase 

      Expected results:

      Worker machines stuck in the 'Running' phase

      Additional info:

      * The issue is not repreduced when using OCP 4.10, 4.11, and 4.12 with OpenshiftSDN on top of OSP 16.1.6.
      * The issue is not repreduced when using 4.12 with OVN-Kubernetes on top of OSP 16.2.

              mdulko Michał Dulko (Inactive)
              rhn-support-imatza Itay Matza
              Jon Uriarte Jon Uriarte
              Red Hat Employee
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: