Loading...

XML

Word

Printable

Type: Sub-task
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Idea: calculate percentiles as part of gathering artifacts from a cluster

Options:

create a new step in the CI operator step registry to process an audit-logs.tar.gz archive produced by gather-audit-logs step
process the audit logs right after the logs are pulled by must-gather: https://github.com/openshift/release/blob/f113ad4a7bd6c6b5597901b2be6d38186982a0da/ci-operator/step-registry/gather/audit-logs/gather-audit-logs-commands.sh#L31
extend https://github.com/openshift/must-gather/blob/b0f5083ca043c77bcc1b285d43afcd6a30386799/collection-scripts/gather_audit_logs to process raw audit logs before they are archived
create a new step which invokes oc adm node-logs for openshift-apiserver and kube-apiserver paths independently of the must-gather

Option 1 has advantage of creating a separate step which can be maintained independently of other steps. On the other hand the step needs to wait until the gather-audit-logs step is finished. Also, audit-logs.tar.gz archive and all individual kube-apiserver and openshift-apiserver archives need to be extracted.

Option 2 saves the step of extracting audit-logs.tar.gz. On the other hand all individual kube-apiserver and openshift-apiserver archives still need to be extracted. No need to create a new step.

Option 3 can work directly with all the individual kube-apiserver and openshift-apiserver audit logs. There are two additional options:

intersect the command which pulls the audit logs in https://github.com/openshift/must-gather/blob/b0f5083ca043c77bcc1b285d43afcd6a30386799/collection-scripts/gather_audit_logs#L47 to also pipe the raw audit log lines into a new binary/script for further processing. This option does not require to pull kube-apiserver and openshift-apiserver audit logs twice
run oc adm node-logs one more time only for kube-apiserver and openshift-apiserver audit logs and process the logs (audit logs pulled twice)

Option 4 has advantage of creating a separate step which can be maintained independently of other steps. Also, the step can be invoked at any point. No need to change must-gather collecting scripts. Disadvantage of this option is getting kube-apiserver and openshift-apiserver audit logs pulled twice.

Advantage of option 4 over option 1 is reduced need in storing all audit logs on file. Additionally, only a fraction of audit logs is processed further (verb=watch, username ends with "operator", stage=ResponseComplete, etc.). Testing of the overall solution is simplified as only a running cluster is required. Rough estimation of additionally pulled kube/openshift-apiserver logs is around 2G.

The overall workflow

for each relevant CI job calculate maximal number of watch requests per operator through all possible 60min long buckets (produced by gather-audit-log-stats step-registry step)
upload the produced stats for the operators into BigQuery database
have TRT dashboard calculate percentiles of their choosing

links to

openshift/release#31302: wip: New workflow: gather-audit-log-stats

Assignee:: Jan Chaloupka

Reporter:: Devan Goodwin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/07/27 1:57 PM

Updated:: 2024/02/05 2:40 PM

Resolved:: 2024/02/05 2:40 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates