• False
    • Hide

      None

      Show
      None
    • False
    • None

      Hello team,

      With the release of COO 1.1 we are getting few cases related to observability-operator and perses-operator pods are getting OOMkilled.

      The default limit for observability operator is 150Mi where as for perses operator is 128Mi

      I would suggest to set the default limit for these 2 operator pods to be atleast 500Mi

       

            [COO-784] Increase memory limit for COO and perses pods

            Also ran into this today and completed the workaround, so good right now

            Mark Stalpinski added a comment - Also ran into this today and completed the workaround, so good right now

            rhn-support-pripatil We are still analyzing the specifics, more memory profiles are always welcome.

            COO 1.1 ships significantly more features, including a new operator (Perses). This requires additional resources the operators must keep track of via Watches. Currently we suspect the increased memory requirements stem from that aspect.

            Jan Fajerski added a comment - rhn-support-pripatil We are still analyzing the specifics, more memory profiles are always welcome. COO 1.1 ships significantly more features, including a new operator (Perses). This requires additional resources the operators must keep track of via Watches. Currently we suspect the increased memory requirements stem from that aspect.

            Hello Team/jfajersk@redhat.com ,

            One of my customers is also experiencing the same issue.  
            I have one query. Could you please answer me:

            The COO was running fine before this update with 150Mi. With this new update,  memory jump from 150Mi, to 512Mi is nearly 4x more. It seems more like an issue with this update than a resource constraint.

            So could you please confirm, What changed with this new update?

            Regards,
            Prithviraj Patil

            Prithviraj Patil added a comment - Hello Team/ jfajersk@redhat.com , One of my customers is also experiencing the same issue.   I have one query. Could you please answer me: The COO was running fine before this update with 150Mi. With this new update,  memory jump from 150Mi, to 512Mi is nearly 4x more. It seems more like an issue with this update than a resource constraint. So could you please confirm, What changed with this new update? Regards, Prithviraj Patil

            Hongyan Li added a comment -

            Test pass with PR

            % oc -n coo get deployment perses-operator  -oyaml | grep -A6 resources:      
                    resources:
                      limits:
                        cpu: 500m
                        memory: 512Mi
                      requests:
                        cpu: 100m
                        memory: 128Mi
            % oc -n coo get deployment observability-operator -oyaml | grep -A6 resources: 
                    resources:
                      limits:
                        cpu: 400m
                        memory: 512Mi
                      requests:
                        cpu: 100m
                        memory: 256Mi 

            Hongyan Li added a comment - Test pass with PR % oc -n coo get deployment perses- operator   -oyaml | grep -A6 resources:               resources:           limits:             cpu: 500m             memory: 512Mi           requests:             cpu: 100m             memory: 128Mi % oc -n coo get deployment observability- operator -oyaml | grep -A6 resources:         resources:           limits:             cpu: 400m             memory: 512Mi           requests:             cpu: 100m             memory: 256Mi

            Hongyan Li added a comment -

            RCA: 
            QE cluster usually 3 master nodes and 3 work nodes, I have never seen the issue. COO is multi-namespace cluster, I suppose the cluster environment has more namespaces which should have effects on OOM of COO pods. Performance QE team has multi-nodes cluster environment which may have more namespaces,  this scenario may be covered by them.
            See the issue on cluster which has 29 nodes

            Hongyan Li added a comment - RCA:  QE cluster usually 3 master nodes and 3 work nodes, I have never seen the issue. COO is multi-namespace cluster, I suppose the cluster environment has more namespaces which should have effects on OOM of COO pods. Performance QE team has multi-nodes cluster environment which may have more namespaces,  this scenario may be covered by them. See the issue on cluster which has 29 nodes

            Changes made. I see why you recommedned it to add it at subscription level so even if the operator auto upgrades then those config stays and custoemr need not make the changes again to the new csv

            Sonigra Saurab added a comment - Changes made. I see why you recommedned it to add it at subscription level so even if the operator auto upgrades then those config stays and custoemr need not make the changes again to the new csv

            Jan each of the component has different limit and request for CPU & memory, if I add the changes directly to sub , it takes those values as default request and limit , 

            But I get your point, I think request of 50m cpu and 150Mi memory + limit of 500m cpu and 512Mi Memory at sub level should be good

             

            Sonigra Saurab added a comment - Jan each of the component has different limit and request for CPU & memory, if I add the changes directly to sub , it takes those values as default request and limit ,  But I get your point, I think request of 50m cpu and 150Mi memory + limit of 500m cpu and 512Mi Memory at sub level should be good  

            Regarding the KCS: is there a benefit when adjusting the CSV over setting this in the subscription like OLM documents? https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md#resources

            Jan Fajerski added a comment - Regarding the KCS: is there a benefit when adjusting the CSV over setting this in the subscription like OLM documents? https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md#resources

            KCS 

            Sonigra Saurab added a comment - KCS 

            I have asked the custoemr to set the limit to 512Mi for temporary basis and see if the same problem is still happening

             

            Sonigra Saurab added a comment - I have asked the custoemr to set the limit to 512Mi for temporary basis and see if the same problem is still happening  

              jfajersk@redhat.com Jan Fajerski
              rhn-support-ssonigra Sonigra Saurab
              Hongyan Li Hongyan Li
              Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated: