Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Hosted Control Planes
Labels:

Activity Type:
Product / Portfolio Work
Parent Link:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Size:
None

Business Value:
6

Target Version:
None
Release Blocker:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Priority Data:
None
PX Impact Score:
PX Technical Impact:
None
PX Impact Range:
None
PX Scheduling Request:
None
PX Technical Impact Notes:
None

Intelligence Requested:
Market:

Feature Overview (aka. Goal Summary)

Rolling out new versions of HyperShift Operator or Hosted Control Plane components such as HyperShift's Control Plane Operator will no longer carry the possibility of triggering a Node rollout that can affect customer workloads running on those nodes

Goals (aka. expected user outcomes)

Customer Nodepool rollouts exhaustive cause list will be:

Due to customer direct scaling up/down of the Nodepool
Due to customer change of Hosted Cluster or Nodepool configuration that is documented to incur in a rollout

Customers will have visibility on rollouts that are pending so that they can effect a rollout of their affected nodepools at their earliest convenience

Requirements (aka. Acceptance Criteria):

Observability:
- It must be possible to account for all Nodepools with pending rollouts
- It must be possible to identify all the Hosted Clusters with Nodepools with pending rollouts
- It must be possible for a customer to see that a Nodepool has pending rollouts
Kubernetes expectations on resource reconciliation must be upheld
Queued rollouts must survive HyperShift restarts

Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	Managed (ROSA and ARO)
Classic (standalone cluster)	No
Hosted control planes	Yes
Multi node, Compact (three node), or Single node (SNO), or all	All supported Managed Hosted Control Plane topologies and configurations
Connected / Restricted Network	All supported Managed Hosted Control Plane topologies and configurations
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	All supported Managed Hosted Control Plane topologies and configurations
Operator compatibility	N/A
Backport needed (list applicable versions)	All supported Managed Hosted Control Plane releases
UI need (e.g. OpenShift Console, dynamic plugin, OCM)	Yes. Console represtation of Nodepools/Machinepools should indicate pending rollouts and allow them to be triggered
Other (please specify)

Use Cases (Optional):

Managed OpenShift fix requires a Node rollout to be applied

Questions to Answer (Optional):

rh-ee-brcox
- Should we explicitly call out the OCP versions in Backport needed (list applicable versions)?
  - asegurap1@redhat.com: Depends on what the supported OCP versions are in Managed OpenShift by the time this feature is delivered
- Is there a missing Goal of this feature, to force a rollout at a particular date? My line of thinking is what about CVE issues on the NodePool OCP version - are we giving them a warning like "hey you have a pending rollout because of a CVE; if you don't update the nodes yourself, we will on such & such date"?
  - asegurap1@redhat.com: Date based rollouts are out of scope (see its section).
jparrill@redhat.com
- What’s the expectations on a regular customer nodePool upgrade? The change will be directly applied or queued following the last requested change?
  - asegurap1@redhat.com: Combinded single rollout.
- This only applies to NodePool changes or also would affect CP upgrades (thinking of OVN changes that could also affect the data plane)?
  - asegurap1@redhat.com: CP upgrades that would trigger Nodepool rollouts are in scope. OVN changes should only apply if CNO or its daemonsets are going to cause reboots
- How the customer will trigger the pending rollouts? An alert will trigger in the hosted cluster console?
  - asegurap1@redhat.com: I guess there are multiple options like scaling down and up and also adding some API to Nodepool
- I assume we will use a new status condition to reflect the current queue of pending rollouts, it’s that the case?.
  - asegurap1@redhat.com: That's a good guess. Hopefully we can represent all we want with it or we constrain ourselves to what it can express
- With "Queued rollouts must survive HyperShift restarts"... What kind of details we wanna store there (“there” should be the place to persist the changes queued), the order, the number of rollouts, the destination Hashes, more info…?

- - asegurap1@redhat.com: I'll leave that as an open question to refine
    I'll If there are more than one change pending, we asume there will be more than one reboot?

Out of Scope

Maintenance windows
Queuing of rollouts on user actions (as that does not meet the Kubernetes reconciliation expectations and is better addressed at either the Cluster Service API level or better yet, at the customer automation side).
Forced rollouts of pending updates on a certain date. That is something that should be handled at the Cluster Service level if there is desire to provide it.

Background

Past incidents with fixes to ignition generation resulting in rollout unexpected by the customer with workload impact

Customer Considerations

There should be an easy way to see, understand the content and trigger queued updates

Documentation Considerations

SOPs for the observability above

ROSA documentation for queued updates

Interoperability Considerations

ROSA/HCP and ARO/HCP

links to

openshift/hypershift#4999: HOSTEDCP-1971: HyperShift operator upgrade test for rollout validation

openshift/release#58638: HOSTEDCP-1971: hypershift: add HO upgrade test

Assignee:: Unassigned

Reporter:: Antoni Segura Puimedon

Need Info From:: None

Contributors:: None

Architect:: None

QA Contact:: He Liu

Doc Contact:: None

Product Operations Engineering Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/10/11 10:17 AM

Updated:: 2025/06/27 10:49 AM

Target end:: 2025/06/03

Details

Description

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates