Uploaded image for project: 'OpenShift Hosted Control Plane'
  1. OpenShift Hosted Control Plane
  2. HOSTEDCP-192

Enable control plane deployment restart on demand

XMLWordPrintable

    • 0
    • 0
    • 0

      Service providers need a way to restart control plane pods on demand (See https://github.com/openshift/hypershift/issues/236). When control plane pods reach a bad state, it should be possible to determine that this is the case through liveness probes and let the system restart them automatically. Any other bad state that is not detectable is a bug that should be fixed. However, reality is that there will be times when a restart of the pods is needed to clear wrong state.

      When using kubectl to restart a deployment through `kubectl rollout restart`, the result of the command is the addition of an annotation (`kubectl.kubernetes.io/restartedAt`) to the deployment's pod template:
      https://github.com/kubernetes/kubectl/blob/47df52af297ea787c44f3d4d7da11e7e4e0d83a8/pkg/polymorphichelpers/objectrestarter.go#L44-L52

      For Hypershift, we can do something similar, in one of 2 ways:

      1) We allow a `hypershift.openshift.io/restartedAt` annotation in the `HostedCluster` to be propagated to the `HostedControlPlane` and then to each deployment's pod template in the control plane. Then we can expose a similar command to the kubectl command in the hypershift CLI: `hypershift restart hostedcluster NAME -n NAMESPACE` that will populate this annotation.

      2) We add a field to the spec of `HostedCluster` that allows us to either a) specify a restartedAt value, or b) an integer that can be incremented, resulting in a new `restartedAt` value we can store in status.

      Given that this is not necessarily part of the spec of a HostedCluster, my preference would be to go with option 1.

            Unassigned Unassigned
            dmace@redhat.com Daniel Mace (Inactive)
            He Liu He Liu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: