-
Feature
-
Resolution: Unresolved
-
Major
-
ACM 2.15.0
-
Product / Portfolio Work
-
False
-
-
False
-
Not Selected
Feature Overview
Provide preventative documentation and process to help customers avoid breaking HUB - Spoke connectivity when they change the CA bundle on the OCP the HUB is running on.
Goals
This is a Feature to create docs and knowledge base information to help customers prevent an issue before it happens through increasing their understanding and giving a clear direction on how to manage this subject.
If a customer changes the CA bundle on the OCP server running an RHACM hub it will break connectivity with RHACM agents on its managed cluster. This is a very hard condition to recover from.
In consultation with Engineering we feel it is better to PREVENT this from happening through customer education. However the challenge is that the change happens on OCP and, in many cases, there's not consideration made for the RHACM Operator on it. We do have some docs on this but they are in the RHACM space and are often not consulted before the change is made on OCP.
So the goals is to provide a simple and clear piece of info for this and ensure it is linked to from both OCP and RHACM docs.
I'd also like to see additional material created, perhaps in the form of an interactive demo or very short video linked into the material for customers to follow.
We have been investigating this issue in the following jiras but feel for the moment we need to work on the preventative solution and see if it reduces the issue signifcantly:
- ACM-9288 ACM update the certificate on the managed-clusters automatically after renew a custom apiserver certificate
- ACM-16322 RFE - Generate the kubeconfig after api or ingress change
- ACM-14928 Optimize the configuration process for hub API server certificate changes
Requirements
TBC but ...
- Documentation review
- New doc created?
- Updates to OCP docs
- Updates to RHACM docs
- KB
- Interactive Demo?
(Optional) Use Cases
Directly from leyan@redhat.com in engineering (thank you!) who explains it better than I could ever do:
This scenario is much more challenging. Because agents with the old CA bundle lose connectivity immediately after the kube-apiserver certificate changes in some cases, it is technically impossible to provide a solution that always works after the change.
It is therefore more reasonable to prevent the issue from occurring rather than trying to develop a universally reliable solution. If users read the ACM documentation before changing the kube-apiserver certificate, most issues can be avoided. However, in practice, users often only encounter problems after making the change, which results in many managed clusters entering an unknown state.
This is a significant challenge for us because changing the kube-apiserver certificate is perceived by users as a normal OCP configuration change. There is no compelling reason for them to consult ACM documentation before performing this action, which makes proactive mitigation difficult.
Questions to answer
- ...
Out of Scope
- …
Background, and strategic fit
This Section: What does the person writing code, testing, documenting
need to know? What context can be provided to frame this feature?
Assumptions
- ...
Customer Considerations
- ...
Documentation Considerations
Questions to be addressed:
- What educational or reference material (docs) is required to support this
product feature? For users/admins? Other functions (security officers, etc)? - Does this feature have a doc impact?
- New Content, Updates to existing content, Release Note, or No Doc Impact
- If unsure and no Technical Writer is available, please contact Content
Strategy. - What concepts do customers need to understand to be successful in
[action]? - How do we expect customers will use the feature? For what purpose(s)?
- What reference material might a customer want/need to complete [action]?
- Is there source material that can be used as reference for the Technical
Writer in writing the content? If yes, please link if available. - What is the doc impact (New Content, Updates to existing content, or
Release Note)?