Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: OSSM 2.6.0
Affects Version/s: OSSM 2.5.2
Component/s: Maistra
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Git Pull Request:
https://github.com/maistra/istio-operator/pull/1823
Intelligence Requested:
Market:

Target Version:

OSSM 2.6.0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

The operator metrics are extremely weird and also broken.

1. The operator serves two different sets of metrics on two different ports (8383 and 8686).
Port 8383 serves the usual metrics:

controller_runtime_reconcile_time_seconds_bucket{controller="servicemeshcontrolplane-controller",le="0.005"} 12
process_cpu_seconds_total 2.11
rest_client_request_latency_seconds_bucket{url="https://10.217.4.1:443/%7Bprefix%7D",verb="GET",le="0.016"} 311
workqueue_adds_total{name="servicemeshcontrolplane-controller"} 20
...

Port 8686 only serves some CRD-related metrics:

# HELP servicemeshcontrolplane_info Information about the ServiceMeshControlPlane custom resource.
# TYPE servicemeshcontrolplane_info gauge
# HELP servicemeshmember_info Information about the ServiceMeshMember custom resource.
# TYPE servicemeshmember_info gauge
# HELP servicemeshmemberroll_info Information about the ServiceMeshMemberRoll custom resource.
# TYPE servicemeshmemberroll_info gauge
# HELP servicemeshcontrolplane_info Information about the ServiceMeshControlPlane custom resource.
# TYPE servicemeshcontrolplane_info gauge

It's not clear why the metrics are split across two different endpoints.

2. The CRD-related metrics are always empty, because the component producing these metrics watches for CRs only in the operator namespace, where there are never any CRs (SMCP, SMMR, SMM). It should instead watch all namespaces.

3. For each of the two sets of metrics, we create a separate kube client instead of reusing the global kube client that's used by the controllers. Because of this:

The operator's memory usage and the number of API watches are higher than they need to be (each client has its own local cache; the operator thus keeps multiple copies of the same resources)
On startup, the operator performs API discovery multiple times. API discovery is an expensive operation due to the large number of required API requests. This number depends highly on the number of APIs registered in the API server (big production clusters with many different operators installed have many APIs). This not only slows down the operator startup, but also puts unnecessary load on the API server.

—

Ideally, we should:

serve all metrics on a single endpoint
fix the empty CRD metrics
use a single kube client

causes

OSSM-6658 Operator sends many unnecessary API requests on startup

Closed

links to

RHSA-2024:135884 Red Hat OpenShift Service Mesh Containers for 2.6.0

mentioned on

Merge request - Updated US source to: 5767571 OSSM-6769 Initialize CNI config only once (#1822)

Merge request - Updated US source to: 7199828 OSSM-6703 Remove metrics server on port 6767 (#1823)

Assignee:: Marko Luksa (Inactive)

Reporter:: Marko Luksa (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/06/20 11:23 AM

Updated:: 2024/11/04 6:49 PM

Resolved:: 2024/08/07 3:39 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates