Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Done
Priority: Undefined
Fix Version/s: openshift-4.12
Affects Version/s: None
Component/s: Metering
Labels:

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request
Provide tooling for collecting profiling data from nodes in OpenShift

2. What is the nature and description of the request?
For nodes that are having performance issues at the node-level, particularly with things such as high softirq usage due to large numbers of IPTables chains, it can be difficult to troubleshoot.

At this stage, support is using a script called the 'monitor.sh' script that is deployed into a customers cluster and is used to capture runtime metrics of the Nodes via the use of a KCS that customers follow. This KCS includes copying bash-scripts into files and pushing these to ConfigMaps. A DaemonSet is then manually created to use these bash-scripts and the approach is very prone to user-error, regularly requiring assistance from Support engineers to assist with this process.

The KCS that is followed is as below:
https://access.redhat.com/solutions/5343671

I believe that there should be tooling around collecting performance metrics in OpenShift. The provided Prometheus metrics that are scrapes rarely include the required fine-grained information that is required by the Networking teams to triage their issues.

3. Why does the customer need this? (List the business requirements here)
Collecting performance data is error-prone and time-consuming for the support engineers. These metrics are normally only required during high-severity cases relating to performance and can result in large amounts of downtime / reduced capacity for customers. The support engineers would be greatly assisted with some tooling around this.

4. List any affected packages or components.
Prometheus, RHCOS, OC tool

is related to

OCPSTRAT-818 Collecting performance / profiling data for nodes in OpenShift

relates to

CFE-912 Node Observability implement RFE-2052