Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-26793

IPv6 address format change during adoption triggers dataplane outage

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • os-net-config
    • None
    • Important

      Originally reported at https://issues.redhat.com/browse/OSPRH-25865 while preparing ipv6 adoption job. The IPv6 address format change triggered interface recreation and caused dataplane outage. More details in slack thread https://redhat-internal.slack.com/archives/C046JULBVJ7/p1769165943803179?thread_ts=1768907288.181099&cid=C046JULBVJ7

      To Reproduce Steps to reproduce the behavior:

      1. Doing adoption where ipv6 address is same pre and post(os-net-config run ifcfg provider) but format is different i.e full address(2620:00cf:00cf:aaaa:0000:0000:0000:0064/64) vs compressed address(2620:cf:cf:aaaa::64/64) can see interfaces are recreated and causing data plane outage. Since the address are same(even ip addr show shows compressed format) it's good to avoid the interface restart if only address format changes

      Expected behavior

      • Interfaces shouldn't be recreated if only format change.

      Bug impact

      • If the address format changed, that causes dataplane outage for the time os-net-config run and ovn-controller setup during edpm nodes adoption stage

      Known workaround

      • Keep the same addresses, that's what we document too that address needs to be kept same. But for IPv6 address formatting also needs to be kept same. This what done in CI currently(17.1 format changed to match 18(compressed one))
      • Hardcoding ip addresses as pre adoption format in edpm_network_config_template dataplance CR can avoid this but not scalable and error prone
      • But in reality we can't change the format on deployed cloud(17.1) to the compressed one(as that's used in 18.0) as that should also trigger the outage. One option that could avoid is to update /etc/sysconfig/network-scripts/ifcfg* files to the new desired compressed format and with this os-net-config run should consider this as noop and avoid restart when format changes during adoption run

       

      This ticket is to handle this within os-net-config that if there is just format change interfaces shouldn't be recreated or restarted. 

      This will require fix backport to 17.1 as that version of os-net-config is used during adoption.

       

      Additional context

      Example diff which triggered the outage:-

      # diff /etc/os-net-config/tripleo_config.yaml /etc/os-net-config/config.yaml 
      0a1
      > ---
      6,9c7,8
      <   dns_servers:
      <   - 2620:00cf:00cf:aaaa:0000:0000:0000:0001
      <   - 2620:00cf:00cf:aaaa:0000:0000:0000:0001
      <   domain: []
      ---
      >   dns_servers: ['2620:cf:cf:aaaa::50']
      >   domain: ['example.com', 'ctlplane.example.com', 'internalapi.example.com', 'storage.example.com', 'tenant.example.com']
      11c10
      <   - ip_netmask: 2620:00cf:00cf:aaaa:0000:0000:0000:0064/64
      ---
      >   - ip_netmask: 2620:cf:cf:aaaa::64/64
      19d17
      <   # internalapi
      24c22,23
      <     - ip_netmask: 2620:00cf:00cf:bbbb:0000:0000:0000:0064/64
      ---
      >     - ip_netmask:
      >         2620:cf:cf:bbbb::64/64
      26d24
      <   # storage
      31c29,30
      <     - ip_netmask: 2620:00cf:00cf:cccc:0000:0000:0000:0064/64
      ---
      >     - ip_netmask:
      >         2620:cf:cf:cccc::64/64
      33d31
      <   # tenant
      38c36,37
      <     - ip_netmask: 2620:00cf:00cf:eeee:0000:0000:0000:0064/64
      ---
      >     - ip_netmask:
      >         2620:cf:cf:eeee::64/64

              arn1@redhat.com Abhiram R N
              ykarel@redhat.com Yatin Karel
              rhos-dfg-nfv
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: