Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.13.z, 4.12.0
Component/s: Cluster Resource Override Admission Operator
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
No

Target Backport Versions:
None
Target Version:

4.16.z
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, when the Cluster Resource Override Operator failed to run its operand controller, the Operator attempted to re-run the controller. Each re-run operation generated a new set of secrets that eventually constrained cluster namespace resources. With this release, the service account for a cluster now includes annotations that prevent the Operator from creating additional secrets when a secret already exists for the cluster. (link:https://issues.redhat.com/browse/OCPBUGS-44351[*~~OCPBUGS-44351~~*])

Show
* Previously, when the Cluster Resource Override Operator failed to run its operand controller, the Operator attempted to re-run the controller. Each re-run operation generated a new set of secrets that eventually constrained cluster namespace resources. With this release, the service account for a cluster now includes annotations that prevent the Operator from creating additional secrets when a secret already exists for the cluster. (link: https://issues.redhat.com/browse/OCPBUGS-44351 [* OCPBUGS-44351 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-42718~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-15462~~. The following is the description of the original issue:
—
Description of problem:

The ClusterResourceOverride operator is creating constantly a high quantity of secrets in the etcd from OCP crashing the utilization of the cluster. Due this peak of memory caused by these requests, ~60 containers from master node change status to exit due OOM-killer

After the upgrade from 4.10 to 4.12, we could note that the etcd objects from the cluster was increasing fastly. They increased from 500MB to 2.5GB in 2 days. 

When we checked the etcd objects, we could identify that the ClusterResourceOverride operator was creating a huge quantity of secrets without identified cause: 


Old report of objects in the cluster: 

[NUMBER OF OBJECTS IN ETCD] 
273139 events
  38353 secrets <<<<<<<

When we checked the secrets in the oc command line, we receive a quantity of 1.5k secrets. 


We can observe in the master that the high utilization is coming from the API SERVER: 


Top MEM-using processes: 
    USER      PID    %CPU  %MEM  VSZ-MiB  RSS-MiB  TTY    STAT   START  TIME  COMMAND  
    root      52188  133   38.9  26447    18792    ?      -      10:36  3:05  kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --


We can also observe these messages in the kube-apiserver: 

/kube-apiserver/kube-apiserver/logs/previous.log:2023-06-22T07:11:17.036606087Z I0622 07:11:17.036542      18 trace.go:205] Trace[1639443765]: "Get" url:/api/v1/namespaces/openshift-clusterresourceoverride/secrets/clusterresourceoverride-token-xxx,user-agent:kube-controller-manager/v1.25.8+37a9a08 (linux/amd64) kubernetes/df8a1b9/system:serviceaccount:kube-system:generic-garbage-collector,audit-id:aaef554b-06dd-4801-94e8-601b5ce75a10,client:10.127.206.24,accept:application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json,protocol:HTTP/2.0 (22-Jun-2023 07:10:21.424) (total time: 55611ms):

kube-apiserver/kube-apiserver/logs/previous.log:2023-06-22T07:11:17.037224152Z E0622 07:11:17.036738      18 timeout.go:141] post-timeout activity - time-elapsed: 1.417764462s, GET "/api/v1/namespaces/openshift-clusterresourceoverride/secrets/clusterresourceoverride-token-nmzvt" result: <nil>

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

High quantity of secrets in the etcd

Expected results:

Stopping the requests of adding secrets

Additional info:

In this case, the customer disabled the operator.

blocks

OCPBUGS-44378 ClusterResourceOverride Operator creating high quantity of clusterresourceoverride-token secrets (57k) in the OCP cluster

Closed

clones

OCPBUGS-42718 ClusterResourceOverride Operator creating high quantity of clusterresourceoverride-token secrets (57k) in the OCP cluster

Closed

is blocked by

OCPBUGS-42718 ClusterResourceOverride Operator creating high quantity of clusterresourceoverride-token secrets (57k) in the OCP cluster

Closed

is cloned by

OCPBUGS-44378 ClusterResourceOverride Operator creating high quantity of clusterresourceoverride-token secrets (57k) in the OCP cluster

Closed

links to

openshift/cluster-resource-override-admission-operator#157: [release-4.16] OCPBUGS-44351: re-inject service account secrets in dynamic ensure client to avoid "secret spam" from openshift-controller-manager

RHBA-2024:9615 OpenShift Container Platform 4.16.z bug fix update

(1 links to)

Assignee:: Joel Smith

Reporter:: OpenShift Prow Bot

Need Info From:: None

Contributors:: None

QA Contact:: Aditi Sahay

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/11/08 3:34 AM

Updated:: 2025/07/19 1:20 PM

Resolved:: 2024/11/20 12:46 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates