Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38468

MicroShift doesn't auto detect the MTU value at start up

    • None
    • 5
    • uShift Sprint 263, uShift Sprint 264, uShift Sprint 265, uShift Sprint 266
    • 4
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When starting MicroShift in a VM using OpenShift Virtualization, with ovn in the picture, it uses part of the MTU for the encapsulation, so VMs have a smaller MTU and it's not detected by MicroShift. Therefore, specifying the MTU value manually in /etc/microshift/ovn.yaml is required.

      Version-Release number of selected component (if applicable):

      4.15 & 4.16    

      How reproducible:

      Always, when starting MicroShift on a VM in OpenShift Virtualization

      Steps to Reproduce:

          1. Create a RHEL VM in OpenShift Virtualization
          2. Try to start MicroShift on it 
          3. Check pods in CrashLoopBack and Pending state.

       

      alternative Reproduction steps (can be performed on any setup):

      - change the MTU on the external facing interface
         nmcli c modify enp1s0 802-3-ethernet.mtu 1300 && nmcli c up enp1s0
      - delete the ovn-master POD so it will be recreated
         oc delete pod/ovnkube-master-kpff2 -n openshift-ovn-kubernetes
      - Check pods in CrashLoopBack and Pending state.
          

       

       

      Actual results:

      The openshift-ovn-kubernetes pod falls in CrashLoopBackOff, the ingress, kube-system, storage and serivce-ca pods shows Pending.  

      Expected results:

      MicroShift auto detects the MTU value needed in the network infrastructure and this value is updated during the start up so all the pods are Running. 

      Additional info:

      โ— microshift.service - MicroShift
           Loaded: loaded (/usr/lib/systemd/system/microshift.service; enabled; prese>
           Active: active (running) since Thu 2024-04-11 04:29:04 EDT; 3min 52s ago
         Main PID: 18912 (microshift)
            Tasks: 12 (limit: 48492)
           Memory: 310.2M
              CPU: 31.698s
           CGroup: /system.slice/microshift.service
                   โ””โ”€18912 microshift run
      
      Apr 11 04:35:30 edge microshift[18912]: kubelet E0411 04:35:30.603473   18912 kubelet.go:2869] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
      Apr 11 04:35:31 edge microshift[18912]: kubelet E0411 04:35:31.620672   18912 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"ovnkube-master\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=ovnkube-master pod=ovnkube-master-ql5k7_openshift-ovn-kubernetes (97a1a2cf-e653-4cf8-bafb-b0188488ac3d)\"" pod="openshift-ovn-kubernetes/ovnkube-master-ql5k7" podUID="97a1a2cf-e653-4cf8-bafb-b0188488ac3d"
      Apr 11 04:35:33 edge microshift[18912]: kube-controller-manager I0411 04:35:33.812683   18912 node_lifecycle_controller.go:785] "Node is NotReady. Adding it to the Taint queue" node="edge" timeStamp="2024-04-11 04:35:33.812595792 -0400 EDT m=+457.202568567"
      
      ----------------------------------------------------------------------
      [cloud-user@edge ~]$ oc get pods -A
      NAMESPACE                  NAME                                       READY   STATUS             RESTARTS      AGE
      kube-system                csi-snapshot-controller-6686957bb9-vz8tv   0/1     Pending            0             17m
      kube-system                csi-snapshot-webhook-64455cd68b-47vjj      0/1     Pending            0             17m
      openshift-dns              node-resolver-247rt                        1/1     Running            0             17m
      openshift-ingress          router-default-65757846cd-4chhm            0/1     Pending            0             17m
      openshift-ovn-kubernetes   ovnkube-master-ql5k7                       3/4     CrashLoopBackOff   8 (23s ago)   17m
      openshift-ovn-kubernetes   ovnkube-node-6w4p8                         1/1     Running            1 (16m ago)   17m
      openshift-service-ca       service-ca-6dbd7c5ddc-2pqpj                0/1     Pending            0             17m
      openshift-storage          topolvm-controller-597486954b-9ffz9        0/5     Pending            0             17m

            [OCPBUGS-38468] MicroShift doesn't auto detect the MTU value at start up

            Steps I used to verify the bug

             

            Modify the interface to set MTU to 1300
            sudo nmcli c modify ens3 802-3-ethernet.mtu 1300 && sudo nmcli c up ens3

             

            Delete pod

            oc delete pod/ovnkube-master-XXXX -n openshift-ovn-kubernetes

             

            Restart MicroShift

            sudo systemctl restart microshift

             

            Check MTU size

            ip address show dev ens3
            2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qdisc fq_codel state UP group default qlen 1000
                link/ether 52:54:00:83:60:be brd ff:ff:ff:ff:ff:ff
                altname enp0s3
                inet 10.1.235.177/24 brd 10.1.235.255 scope global dynamic noprefixroute ens3
                   valid_lft 282sec preferred_lft 282sec
                inet6 2620:52:0:1eb:5054:ff:fe83:60be/64 scope global dynamic noprefixroute 
                   valid_lft 2591858sec preferred_lft 604658sec
                inet6 fe80::5054:ff:fe83:60be/64 scope link noprefixroute 
                   valid_lft forever preferred_lft forever

             

            Check content of the config-map to see the MTU value

            [redhat@microshift-base-1995 ~]$ oc get configmap/ovnkube-config -n openshift-ovn-kubernetes -oyaml | grep mtu 
            
            mtu="1300"

             

            Check status of pods

            [redhat@microshift-base-1995 ~]$ oc get pods -A
            NAMESPACE                              NAME                                       READY   STATUS    RESTARTS        AGE
            kube-system                            csi-snapshot-controller-66d6bb45dc-4nw9g   1/1     Running   0               2d19h
            openshift-dns                          dns-default-28qsg                          2/2     Running   0               2d19h
            openshift-dns                          node-resolver-wd6jh                        1/1     Running   0               2d19h
            openshift-ingress                      router-default-d9ddddbd4-mw55q             1/1     Running   0               2d19h
            openshift-multus                       dhcp-daemon-5ptd7                          1/1     Running   0               2d19h
            openshift-multus                       multus-tbxbq                               1/1     Running   0               2d19h
            openshift-operator-lifecycle-manager   catalog-operator-8568cd4fbb-x4ddw          1/1     Running   0               2d19h
            openshift-operator-lifecycle-manager   olm-operator-644fd66cb7-gwdpj              1/1     Running   0               2d19h
            openshift-ovn-kubernetes               ovnkube-master-rb9t9                       4/4     Running   32 (47m ago)    3h
            openshift-ovn-kubernetes               ovnkube-node-gqbjc                         1/1     Running   3 (3h ago)      2d19h
            openshift-service-ca                   service-ca-7df9c675f4-cmqv2                1/1     Running   0               2d19h
            openshift-storage                      lvms-operator-d6f9c9d4-6jwf7               1/1     Running   0               2d19h
            openshift-storage                      vg-manager-zp4ql                           1/1     Running   1 (2d19h ago)   2d19h

             

             

             

            Douglas Hensel added a comment - Steps I used to verify the bug   Modify the interface to set MTU to 1300 sudo nmcli c modify ens3 802-3-ethernet.mtu 1300 && sudo nmcli c up ens3   Delete pod oc delete pod/ovnkube-master-XXXX -n openshift-ovn-kubernetes   Restart MicroShift sudo systemctl restart microshift   Check MTU size ip address show dev ens3 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qdisc fq_codel state UP group default qlen 1000     link/ether 52:54:00:83:60:be brd ff:ff:ff:ff:ff:ff     altname enp0s3     inet 10.1.235.177/24 brd 10.1.235.255 scope global dynamic noprefixroute ens3        valid_lft 282sec preferred_lft 282sec     inet6 2620:52:0:1eb:5054:ff:fe83:60be/64 scope global dynamic noprefixroute         valid_lft 2591858sec preferred_lft 604658sec     inet6 fe80::5054:ff:fe83:60be/64 scope link noprefixroute         valid_lft forever preferred_lft forever   Check content of the config-map to see the MTU value [redhat@microshift-base-1995 ~]$ oc get configmap/ovnkube-config -n openshift-ovn-kubernetes -oyaml | grep mtu mtu="1300"   Check status of pods [redhat@microshift-base-1995 ~]$ oc get pods -A NAMESPACE                              NAME                                       READY   STATUS    RESTARTS        AGE kube-system                            csi-snapshot-controller-66d6bb45dc-4nw9g   1/1     Running   0               2d19h openshift-dns                          dns-default-28qsg                          2/2     Running   0               2d19h openshift-dns                          node-resolver-wd6jh                        1/1     Running   0               2d19h openshift-ingress                      router-default-d9ddddbd4-mw55q             1/1     Running   0               2d19h openshift-multus                       dhcp-daemon-5ptd7                          1/1     Running   0               2d19h openshift-multus                       multus-tbxbq                               1/1     Running   0               2d19h openshift-operator-lifecycle-manager   catalog-operator-8568cd4fbb-x4ddw          1/1     Running   0               2d19h openshift-operator-lifecycle-manager   olm-operator-644fd66cb7-gwdpj              1/1     Running   0               2d19h openshift-ovn-kubernetes               ovnkube-master-rb9t9                       4/4     Running   32 (47m ago)    3h openshift-ovn-kubernetes               ovnkube-node-gqbjc                         1/1     Running   3 (3h ago)      2d19h openshift-service-ca                   service-ca-7df9c675f4-cmqv2                1/1     Running   0               2d19h openshift-storage                      lvms-operator-d6f9c9d4-6jwf7               1/1     Running   0               2d19h openshift-storage                      vg-manager-zp4ql                           1/1     Running   1 (2d19h ago)   2d19h      

            Evgeny Slutsky added a comment - - edited

            Microshift is creating this config map for the OVN pods using configmap/ovnkube-config in the openshift-ovn-kubernetes Namespace,
            can you please verify if this config map was updated with the correct MTU ?
            if not try restarting Microshift should help

            Evgeny Slutsky added a comment - - edited Microshift is creating this config map for the OVN pods using configmap/ovnkube-config in the openshift-ovn-kubernetes Namespace, can you please verify if this config map was updated with the correct MTU ? if not try restarting Microshift should help

            I followed the alternative Reproduction steps above while trying to verify this bug. I ran into an issue trying to restart the pod (ovnkube-master-XXXX) for openshift-ovn-kubernetes namespace. It falls to CrashLoopBackOff.

            The mtu size was changed to 1300 using nmcli

            [redhat@microshift-base-1995 ~]$ oc logs pods/ovnkube-master-kzrhp -n openshift-ovn-kubernetes -c ovnkube-master | tail
            I0124 22:13:49.984518   19256 handler.go:219] Removed *v1.Node event handler 2
            I0124 22:13:49.984533   19256 obj_retry.go:432] Stop channel got triggered: will stop retrying failed objects of type *v1.NetworkPolicy
            I0124 22:13:49.984541   19256 obj_retry.go:432] Stop channel got triggered: will stop retrying failed objects of type *v1.Pod
            I0124 22:13:49.984552   19256 services_controller.go:183] Shutting down controller ovn-lb-controller for network=default
            I0124 22:13:49.984566   19256 obj_retry.go:432] Stop channel got triggered: will stop retrying failed objects of type *v1.Node
            I0124 22:13:49.984574   19256 obj_retry.go:432] Stop channel got triggered: will stop retrying failed objects of type *v1.Namespace
            I0124 22:13:49.984592   19256 handler.go:219] Removed *v1.Node event handler 3
            I0124 22:13:49.984601   19256 handler.go:219] Removed *v1.Node event handler 4
            I0124 22:13:49.984661   19256 ovnkube.go:599] Stopped ovnkube
            F0124 22:13:49.984675   19256 ovnkube.go:137] failed to run ovnkube: failed to start node network controller: failed to start default node network controller: MTU (1300) of network interface ens3 is too small for specified overlay MTU (1500) 

            SOS report

            Douglas Hensel added a comment - I followed the alternative Reproduction steps above while trying to verify this bug. I ran into an issue trying to restart the pod (ovnkube-master-XXXX) for openshift-ovn-kubernetes namespace. It falls to CrashLoopBackOff. The mtu size was changed to 1300 using nmcli [redhat@microshift-base-1995 ~]$ oc logs pods/ovnkube-master-kzrhp -n openshift-ovn-kubernetes -c ovnkube-master | tail I0124 22:13:49.984518 19256 handler.go:219] Removed *v1.Node event handler 2 I0124 22:13:49.984533 19256 obj_retry.go:432] Stop channel got triggered: will stop retrying failed objects of type *v1.NetworkPolicy I0124 22:13:49.984541 19256 obj_retry.go:432] Stop channel got triggered: will stop retrying failed objects of type *v1.Pod I0124 22:13:49.984552 19256 services_controller.go:183] Shutting down controller ovn-lb-controller for network= default I0124 22:13:49.984566 19256 obj_retry.go:432] Stop channel got triggered: will stop retrying failed objects of type *v1.Node I0124 22:13:49.984574 19256 obj_retry.go:432] Stop channel got triggered: will stop retrying failed objects of type *v1.Namespace I0124 22:13:49.984592 19256 handler.go:219] Removed *v1.Node event handler 3 I0124 22:13:49.984601 19256 handler.go:219] Removed *v1.Node event handler 4 I0124 22:13:49.984661 19256 ovnkube.go:599] Stopped ovnkube F0124 22:13:49.984675 19256 ovnkube.go:137] failed to run ovnkube: failed to start node network controller: failed to start default node network controller: MTU (1300) of network interface ens3 is too small for specified overlay MTU (1500) SOS report

            Hi eslutsky,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi eslutsky , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            found this doc https://github.com/pliurh/microshift/blob/main/docs/contributor/network/default_cni_plugin.md with a note:
            When mtu is not provided, it is set to the default route MTU.
            pliurh , is it possible the doc is not aligned with latest changes that had been done here ?

             

            Evgeny Slutsky added a comment - found this doc  https://github.com/pliurh/microshift/blob/main/docs/contributor/network/default_cni_plugin.md  with a note: When mtu is not provided, it is set to the default route MTU. pliurh , is it possible the doc is not aligned with latest changes that had been done here ?  

            Evgeny Slutsky added a comment - - edited

            In Cases  there is no   MTU configuration set, then Microshift   is receiving its  MTU configuration from the br-ex bridge (if it exists)  , or it  defaults to 1500,

            there is no any code that takes the MTU from other  NICs . 

             

            Evgeny Slutsky added a comment - - edited In Cases  there is no   MTU configuration set, then Microshift   is receiving its  MTU configuration from the br-ex bridge (if it exists)  , or it  defaults to 1500, there is no any code that takes the MTU from other  NICs .   

              eslutsky Evgeny Slutsky
              dialvare Diego Alvarez Ponce
              John George John George
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: