Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7445

MTU migration configuration is cleaned up prematurely while in progress

XMLWordPrintable

    • Moderate
    • No
    • SDN Sprint 233
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: MTU migration configuration is cleaned up prematurely while migration is in progress

      Consequence: MTU migration might fail while in progress preventing it to complete. On failure, either a node is tainted with "too-small-mtu`or control plane network pods might fail to start complaining about invalid MTU values. The workaround is to reverse the performed migration steps and reboot the affected node manually to restore the system to its initial state before migration.

      Fix: MTU migration configuration is preserved while migration is in progress

      Result: MTU migration completes successfully.
      Show
      Cause: MTU migration configuration is cleaned up prematurely while migration is in progress Consequence: MTU migration might fail while in progress preventing it to complete. On failure, either a node is tainted with "too-small-mtu`or control plane network pods might fail to start complaining about invalid MTU values. The workaround is to reverse the performed migration steps and reboot the affected node manually to restore the system to its initial state before migration. Fix: MTU migration configuration is preserved while migration is in progress Result: MTU migration completes successfully.
    • Bug Fix

      This is a clone of issue OCPBUGS-7207. The following is the description of the original issue:

      At some point in the mtu-migration development a configuration file was generated at /etc/cno/mtu-migration/config which was used as a flag to indicate to configure-ovs that a migration procedure was in progress. When that file was missing, it was assumed the migration procedure was over and configure-ovs did some cleaning on behalf of it.

      But that changed and /etc/cno/mtu-migration/config is never set. That causes configure-ovs to remove mtu-migration information when the procedure is still in progress making it to use incorrect MTU values and either causing nodes to be tainted with "ovn.k8s.org/mtu-too-small" blocking the procedure itself or causing network disruption until the procedure is over.

      However, this was not a problem for the CI job as it doesn't use the migration procedure as documented for the sake of saving limited time available to run CI jobs. The CI merges two steps of the procedure into one so that there is never a reboot while the procedure is in progress and hiding this issue.

      This was probably not detected in QE as well for the same reason as CI.

              jcaamano@redhat.com Jaime CaamaƱo Ruiz
              openshift-crt-jira-prow OpenShift Prow Bot
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: