[CORENET-3309] change ovnkube-master DB startup to be compatible with HyperShift, etc

Type: Story
Resolution: Obsolete
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- hypershift
- migrated-from-sdn

Blocked:
False
Ready:
False
Epic Link:
CORENET-2972
BZ Doc Type:
If docs needed, set a value
BZ requires_doc_text:
Unset
Dev Approval:
Not Set
Docs Approval:
Not Set
PM Approval:
Not Set
QE Approval:
Not Set
QEStatus:
ToDo
Release Note Text:
undefined
Solution:
Untriaged
Support Scope:
Not Supported
[QE] Why QE missed?:
---
BZ Keywords:
- Unset

WSJF:
0

Root Cause:
Untriaged

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

In HyperShift, most things that would get deployed to masters in "normal" OCP get deployed to workers instead. (There is some possibility that some HyperShift users may have "infra nodes" for this stuff, but that's not a requirement.)

This is incompatible with how we currently deploy the OVN databases, which looks like:

CNO figures out the IPs of all of the masters
CNO writes out an ovnkube-master DaemonSet that hardcodes those IPs in multiple places

If we want to be able to deploy to workers (or infra nodes), we can't just deploy to every node of the right type like with do with masters (because there might be more or less than 3), but if we just create a Deployment with "replicas: 3", then we don't know ahead of time which nodes it's going to get deployed to, so we can't write the right IPs out into the Deployment.

The fix for this is to change how the DB IP detection works. Instead of figuring it out before writing the Deployment, we need to change the wrapper scripts to figure it out themselves (eg by looking up something in the kube db). Also, it would need to keep monitoring and restart the db if the peer IPs changed.

This would be compatible with either running the DB on infra nodes or on worker nodes (or with running it on the masters in the HyperShift management cluster, which is another possibility that was discussed, but which is also architecturally different enough from "regular OCP" that the current DaemonSet wouldn't work).

(I guess another possibility would be to pass the DB IPs via a ConfigMap rather than via environment variables, and CNO can just update the ConfigMap after all of the DB pods have started up, and the DB pods would wait for the ConfigMap to contain values. But the other way seems better to me?)

links to

openshift/cluster-network-operator#1158: Bug 1987019: Support external control plane topology

Dan Winship added a comment - 2022/05/23 4:18 PM

nope, agreed, we did something else and this is irrelevant now

Dan Winship added a comment - 2022/05/23 4:18 PM nope, agreed, we did something else and this is irrelevant now

Casey Callendrello (Inactive) added a comment - 2022/05/16 7:35 PM

Hmm. I think this issue can be closed, given that we aligned on using StatefulSets for Hypershift-style deployments.

dwinship@redhat.com - any work captured in this epic that you think needs to be preserved?

Casey Callendrello (Inactive) added a comment - 2022/05/16 7:35 PM Hmm. I think this issue can be closed, given that we aligned on using StatefulSets for Hypershift-style deployments. dwinship@redhat.com - any work captured in this epic that you think needs to be preserved?

Assignee:: Unassigned

Reporter:: Dan Winship

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021/08/14 3:16 PM

Updated:: 2025/03/30 8:09 PM

Resolved:: 2022/05/23 4:19 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Dan Winship added a comment - 2022/05/23 4:18 PM

Expand comment: Dan Winship added a comment - 2022/05/23 4:18 PM

Collapse comment: Casey Callendrello (Inactive) added a comment - 2022/05/16 7:35 PM

Expand comment: Casey Callendrello (Inactive) added a comment - 2022/05/16 7:35 PM

People

Dates

PagerDuty