Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15462

ClusterResourceOverride Operator creating high quantity of clusterresourceoverride-token secrets (57k) in the OCP cluster

XMLWordPrintable

    • Important
    • No
    • 3
    • PODAUTO - Sprint 261
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      The ClusterResourceOverride operator is creating constantly a high quantity of secrets in the etcd from OCP crashing the utilization of the cluster. Due this peak of memory caused by these requests, ~60 containers from master node change status to exit due OOM-killer
      
      After the upgrade from 4.10 to 4.12, we could note that the etcd objects from the cluster was increasing fastly. They increased from 500MB to 2.5GB in 2 days. 
      
      When we checked the etcd objects, we could identify that the ClusterResourceOverride operator was creating a huge quantity of secrets without identified cause: 
      
      
      Old report of objects in the cluster: 
      
      [NUMBER OF OBJECTS IN ETCD] 
      273139 events
        38353 secrets <<<<<<<
      
      When we checked the secrets in the oc command line, we receive a quantity of 1.5k secrets. 
      
      
      We can observe in the master that the high utilization is coming from the API SERVER: 
      
      
      Top MEM-using processes: 
          USER      PID    %CPU  %MEM  VSZ-MiB  RSS-MiB  TTY    STAT   START  TIME  COMMAND  
          root      52188  133   38.9  26447    18792    ?      -      10:36  3:05  kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --
      
      
      We can also observe these messages in the kube-apiserver: 
      
      /kube-apiserver/kube-apiserver/logs/previous.log:2023-06-22T07:11:17.036606087Z I0622 07:11:17.036542      18 trace.go:205] Trace[1639443765]: "Get" url:/api/v1/namespaces/openshift-clusterresourceoverride/secrets/clusterresourceoverride-token-xxx,user-agent:kube-controller-manager/v1.25.8+37a9a08 (linux/amd64) kubernetes/df8a1b9/system:serviceaccount:kube-system:generic-garbage-collector,audit-id:aaef554b-06dd-4801-94e8-601b5ce75a10,client:10.127.206.24,accept:application/vnd.kubernetes.protobuf;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json;as=PartialObjectMetadata;g=meta.k8s.io;v=v1,application/json,protocol:HTTP/2.0 (22-Jun-2023 07:10:21.424) (total time: 55611ms):
      
      kube-apiserver/kube-apiserver/logs/previous.log:2023-06-22T07:11:17.037224152Z E0622 07:11:17.036738      18 timeout.go:141] post-timeout activity - time-elapsed: 1.417764462s, GET "/api/v1/namespaces/openshift-clusterresourceoverride/secrets/clusterresourceoverride-token-nmzvt" result: <nil>
      
      
      
      

       

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      High quantity of secrets in the etcd

      Expected results:

      Stopping the requests of adding secrets

      Additional info:

      In this case, the customer disabled the operator.

              jkyros@redhat.com John Kyros
              rhn-support-bgomes Bruno Gomes
              Aditi Sahay Aditi Sahay
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated: