Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76329

etcd 2x-3x memory usage regression between etcd 3.5 and 3.6 - scales with cluster size

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.21.z
    • Etcd
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • Yes
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      A memory usage regression has been identified during ROSA 4.21 Performance and Scale testing while comparing OCP 4.21 with 4.20 nightly builds. The regression is also observed when comparing 4.21 nightlies around Nov 20 when etcd was updated from 3.5 to 3.6

      The regression scales with cluster size, with larger clusters experiencing higher memory consumption.

      24-nodes cluster dashboard screenshot

       

      120-nodes cluster dashboard screenshot

      250-nodes cluster dashboard screenshot

      ROSA 250-nodes cluster dashboad screenshot

       

       

      Cluster Size etcd 3.5 Avg RSS etcd 3.6 Avg RSS Avg Change etcd 3.5 Max RSS etcd 3.6 Max RSS Max Change
      24 nodes 359 MiB 360 MiB +0.3% 468 MiB 541 MiB +15.6%
      120 nodes 550 MiB 601 MiB +9.3% 893 MiB 1.24 GiB +42.2%
      250 nodes 682 MiB 974 MiB +42.8% 1.13 GiB 2.96 GiB +162%

       

      Version-Release number of selected component (if applicable):

      4.21    

       

      How reproducible:

      Always - reproducible across multiple cluster sizes (24, 120, 250 nodes) and deployment types (self-managed and ROSA)    

       

      Steps to Reproduce:

          1. Deploy OCP cluster with etcd 3.5, e.g. 4.20 nightly      
          2. Run cluster-density-v2 workload using kube-burner
          3. Record etcd RSS memory usage(average and max)
          4. Deploy OCP cluster with etcd 3.6 e.g., 4.21 nightly
          5. Run the same cluster-density-v2 workload
          6. Compare etcd RSS memory usage - observe significant increase, especially in max RSS     
      
      
      Test Environment:Platform: AWS
      SDN: OVNKubernetes
      Test workload: cluster-density-v2

      Actual results:

      etcd 3.6 shows significant memory usage regression compared to etcd 3.5

      Expected results:

      etcd 3.6 memory usage should be comparable to etcd 3.5, with no significant regression.    

      Additional info:

              dwest@redhat.com Dean West
              mcornea@redhat.com Marius Cornea
              None
              None
              Ge Liu Ge Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: