Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13
Component/s: Monitoring
Labels:
None

Test Coverage:

-
Severity:
Low
Regression:
None
Epic Link:
MGMT-13418
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.13.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

As part of the efforts of improving the installation time of single node openshift, we've noticed the monitoring operator takes a long* time to finish installation.

It's hard for me to tell what exactly the monitoring operator is waiting for, but it becoming happy (as far as clusteroperator conditions are concerned) always seems to coincide with the operator finally realizing and reconciling** the 2 additional certificates inside the extension-apiserver-authentication that are being added by the apiserver operator. 

Usually this "realization" happens minutes after the two certs are being added, and ideally we'd like to cut back on that time, because sometimes those minutes lead to the monitoring operator being the last to roll out.

*Long time on the order of just a few minutes, which are not a lot but they add up. This ticket is one in a series of ticket we're opening for many other components

**The "marker" I use to know when this happened is when the monitoring operator, among other things, replaces the old prometheus-adapter-<hash_x> secret containing just the original certs of extension-apiserver-authentication with a new prometheus-adapter-<hash_y> which also contains the 2 new certs

Version-Release number of selected component (if applicable):

nightly 4.13 OCP

How reproducible:

100%

Steps to Reproduce:

1. Install single-node-openshift

Actual results:

Monitoring operator long delay reconciling extension-apiserver-authentication

Expected results:

Monitoring operator immediate reconciliation of extension-apiserver-authentication

Additional info:

Originally I suspected this might be due to api server downtime (which is a property of SNO), but this issue doesn't seem to correlate with API downtime

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

cmo_errors.log
8 kB
2023/02/14 3:54 PM
kaso.log
78 kB
2023/02/14 3:55 PM
my.tar.gz
9.87 MB
2023/02/13 4:32 PM

links to

openshift/cluster-monitoring-operator#1900: OCPBUGS-7391: wait for service CA secrets

Assignee:: Simon Pasquier

Reporter:: Omer Tuchfeld

QA Contact:: Junqi Zhao

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/02/13 1:47 PM

Updated:: 2023/11/15 8:00 AM

Resolved:: 2023/05/17 10:39 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates