Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-6326

Update ovs container image without disrupting gateway datapath

XMLWordPrintable

    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • Committed
    • No Docs Impact
    • Proposed
    • Proposed
    • None
    • Release Note Not Required
    • Hide

      RHOSO18Beta waived:Upgrades: update + data plane disruption + OVN

      Show
      RHOSO18Beta waived: Upgrades: update + data plane disruption + OVN
    • Neutron Sprint 97
    • Important
    • Networking; Neutron

      Right now, when a new ovs container image is rolled, all OVS pods will restart; on new vswitchd process start, it will flush all kernel flows. This will disrupt gateway datapath, until ovn-controller reconnects to vswitchd and reinstalls its flows again.

       

      To avoid it, we can adopt flow-restore-wait option for vswitchd. It is implemented as `reload` action on vswitchd systemd service unit. Since we don't use systemd units in podified env, we need to reimplement it in our service stop/startup scripts.

      In PreStop:

      • dump flows to a file on PVC;

       

      In PreStart:

      • set flow-restore-wait=true;
      • start vswitchd;
      • once vswitchd is up:
        • restore flows;
        • set flow-restore-wait=false to allow ovn-controller to reconnect.

       

      When backing flows up, consider how long it may take, and if the default pod timeouts for startup / liveness are long enough to never hit a forced kill before vswitchd is started. (alternatively, consider modifying liveness checks to monitor flow dump progress.)

            egarciar@redhat.com Elvira Garcia
            ihrachys Ihar Hrachyshka
            Bharath M V Bharath M V
            rhos-dfg-networking-squad-neutron
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: