-
Feature Request
-
Resolution: Done
-
Undefined
-
None
-
False
-
False
-
Undefined
-
-
-
-
1. Proposed title of this feature request
Provide tooling for collecting profiling data from nodes in OpenShift
2. What is the nature and description of the request?
For nodes that are having performance issues at the node-level, particularly with things such as high softirq usage due to large numbers of IPTables chains, it can be difficult to troubleshoot.
At this stage, support is using a script called the 'monitor.sh' script that is deployed into a customers cluster and is used to capture runtime metrics of the Nodes via the use of a KCS that customers follow. This KCS includes copying bash-scripts into files and pushing these to ConfigMaps. A DaemonSet is then manually created to use these bash-scripts and the approach is very prone to user-error, regularly requiring assistance from Support engineers to assist with this process.
The KCS that is followed is as below:
https://access.redhat.com/solutions/5343671
I believe that there should be tooling around collecting performance metrics in OpenShift. The provided Prometheus metrics that are scrapes rarely include the required fine-grained information that is required by the Networking teams to triage their issues.
3. Why does the customer need this? (List the business requirements here)
Collecting performance data is error-prone and time-consuming for the support engineers. These metrics are normally only required during high-severity cases relating to performance and can result in large amounts of downtime / reduced capacity for customers. The support engineers would be greatly assisted with some tooling around this.
4. List any affected packages or components.
Prometheus, RHCOS, OC tool
- is related to
-
OCPSTRAT-818 Collecting performance / profiling data for nodes in OpenShift
- New
- relates to
-
CFE-912 Node Observability implement RFE-2052
- Closed
-
CFE-453 NodeObservabilityOperator - TechPreview V2
- Closed
- links to