-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
4.13.z, 4.12.0
This is a clone of issue OCPBUGS-15462. The following is the description of the original issue:
—
Description of problem:
The ClusterResourceOverride operator is creating constantly a high quantity of secrets in the etcd from OCP crashing the utilization of the cluster. Due this peak of memory caused by these requests, ~60 containers from master node change status to exit due OOM-killer After the upgrade from 4.10 to 4.12, we could note that the etcd objects from the cluster was increasing fastly. They increased from 500MB to 2.5GB in 2 days. When we checked the etcd objects, we could identify that the ClusterResourceOverride operator was creating a huge quantity of secrets without identified cause: Old report of objects in the cluster: [NUMBER OF OBJECTS IN ETCD] 273139 events 38353 secrets <<<<<<< When we checked the secrets in the oc command line, we receive a quantity of 1.5k secrets. We can observe in the master that the high utilization is coming from the API SERVER: Top MEM-using processes: USER PID %CPU %MEM VSZ-MiB RSS-MiB TTY STAT START TIME COMMAND root 52188 133 38.9 26447 18792 ? - 10:36 3:05 kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml -- We can also observe these messages in the kube-apiserver: /kube-apiserver/kube-apiserver/logs/previous.log:2023-06-22T07:11:17.036606087Z I0622 07:11:17.036542 18 trace.go:205] Trace[1639443765]: "Get" url:/api/v1/namespaces/openshift-clusterresourceoverride/secrets/clusterresourceoverride-token-xxx,user-agent:kube-controller-manager/v1.25.8+37a9a08 (linux/amd64) kubernetes/df8a1b9/system:serviceaccount:kube-system:generic-garbage-collector,audit-id:aaef554b-06dd-4801-94e8-601b5ce75a10,client:10.127.206.24,accept:application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json,protocol:HTTP/2.0 (22-Jun-2023 07:10:21.424) (total time: 55611ms): kube-apiserver/kube-apiserver/logs/previous.log:2023-06-22T07:11:17.037224152Z E0622 07:11:17.036738 18 timeout.go:141] post-timeout activity - time-elapsed: 1.417764462s, GET "/api/v1/namespaces/openshift-clusterresourceoverride/secrets/clusterresourceoverride-token-nmzvt" result: <nil>
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
High quantity of secrets in the etcd
Expected results:
Stopping the requests of adding secrets
Additional info:
In this case, the customer disabled the operator.
- blocks
-
OCPBUGS-44351 ClusterResourceOverride Operator creating high quantity of clusterresourceoverride-token secrets (57k) in the OCP cluster
- Closed
- clones
-
OCPBUGS-15462 ClusterResourceOverride Operator creating high quantity of clusterresourceoverride-token secrets (57k) in the OCP cluster
- Verified
- is blocked by
-
OCPBUGS-15462 ClusterResourceOverride Operator creating high quantity of clusterresourceoverride-token secrets (57k) in the OCP cluster
- Verified
- is cloned by
-
OCPBUGS-44351 ClusterResourceOverride Operator creating high quantity of clusterresourceoverride-token secrets (57k) in the OCP cluster
- Closed
- links to
-
RHBA-2024:9610 OpenShift Container Platform 4.17.z bug fix update