-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
50% To Do, 0% In Progress, 50% Done
-
0
Feature Overview (aka. Goal Summary)
As a customer of self managed OpenShift or an SRE managing a fleet of OpenShift clusters I should be able to determine the progress and state of an OCP upgrade and only be alerted if the cluster is unable to progress.
Goals (aka. expected user outcomes)
Cluster administrators should be able to monitor the progress of each component during an upgrade through metrics.
Alerts should only be triggered if the component can not upgrade within a reasonable amount of time.
Requirements (aka. Acceptance Criteria):
- Determine a set amount time required to pass before an upgrade triggers a failure
- Operators should not alert during upgrades unless there is an action required by a cluster administrator
- Cluster administrators should be able to determine when each component has completed its upgrade
- All components should emit metrics with regard to its upgrade state
Notes: https://docs.google.com/document/d/1W90q9lqUinQgUbAOSCXhLHnGHgEARjzl6BBZxxR1Qzo/edit
Use Cases (Optional):
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Questions to Answer (Optional):
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
Out of Scope
High-level list of items that are out of scope. Initial completion during Refinement status.
Background
In OpenShift, you can determine if an update has failed by inspecting the status of the cluster version and the update process.
oc get clusterversion | grep -i failing
- Check Cluster Version Status
{{}}oc get clusterversion
{}to see the current version of your cluster and the status of any ongoing update. This command provides a summary of the cluster version and update status.
- Detailed Status of the Update
{{}}oc describe clusterversion
{}This will give you a detailed output, including conditions that indicate the health and status of the update process. Look for conditions like Progressing, Available, or Failing. A Failing status here would indicate that there has been an issue with the update.
- Check Cluster Operator Status
{{}}oc get clusteroperator
This will show you the status of each operator in the cluster. Operators with a status of Degraded or Progressing for an extended period might indicate issues that could impact the update.
Customer Considerations
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Documentation Considerations
Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.
Interoperability Considerations
Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
- depends on
-
TRT-1578 Ensure all HA components are not degraded by design during upgrades
- New
-
OTA-700 Ensure availability of all HA components during upgrades
- Closed
-
OCPBUGS-9133 ClusterVersion Failing=True and Available=False should trigger alerts
- Closed
- is related to
-
OCPSTRAT-1823 [GA] 'oc adm upgrade status' command and status API
- New
-
OCPSTRAT-1356 'oc adm upgrade status' command improvements - Tech Preview
- Release Pending
-
OCPSTRAT-648 (Tech Preview) 'oc adm upgrade status' command
- Closed
- relates to
-
OCPSTRAT-835 Improve upgrades - Reduce False Positives status from operators
- Closed