Uploaded image for project: 'OpenShift SDN'
  1. OpenShift SDN
  2. SDN-2119

change ovnkube-master DB startup to be compatible with HyperShift, etc


    • Icon: Story Story
    • Resolution: Obsolete
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • False
    • OCPSTRAT-112 - Define & Implement HyperShift Network Topology & Component Locality
    • undefined
    • 0
    • 0

      In HyperShift, most things that would get deployed to masters in "normal" OCP get deployed to workers instead. (There is some possibility that some HyperShift users may have "infra nodes" for this stuff, but that's not a requirement.)

      This is incompatible with how we currently deploy the OVN databases, which looks like:

      1. CNO figures out the IPs of all of the masters
      2. CNO writes out an ovnkube-master DaemonSet that hardcodes those IPs in multiple places

      If we want to be able to deploy to workers (or infra nodes), we can't just deploy to every node of the right type like with do with masters (because there might be more or less than 3), but if we just create a Deployment with "replicas: 3", then we don't know ahead of time which nodes it's going to get deployed to, so we can't write the right IPs out into the Deployment.

      The fix for this is to change how the DB IP detection works. Instead of figuring it out before writing the Deployment, we need to change the wrapper scripts to figure it out themselves (eg by looking up something in the kube db). Also, it would need to keep monitoring and restart the db if the peer IPs changed.

      This would be compatible with either running the DB on infra nodes or on worker nodes (or with running it on the masters in the HyperShift management cluster, which is another possibility that was discussed, but which is also architecturally different enough from "regular OCP" that the current DaemonSet wouldn't work).


      (I guess another possibility would be to pass the DB IPs via a ConfigMap rather than via environment variables, and CNO can just update the ConfigMap after all of the DB pods have started up, and the DB pods would wait for the ConfigMap to contain values. But the other way seems better to me?)

            Unassigned Unassigned
            dwinship@redhat.com Dan Winship
            0 Vote for this issue
            2 Start watching this issue