Uploaded image for project: 'Knative Serving'
  1. Knative Serving
  2. SRVKS-385

Unexpected scaling of revisions during an upgrade

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 1.3.0
    • None
    • None

      When a load test is run during the upgrade the revision is unexpectedly scaled down and up.

      Steps to reproduce:

      • spin up 40 clients sending requests to a knative service
      • wait until a steady state when revision is scaled up (to 7 pods in my case, waiting for two minutes before the next step)
      • start an upgrade of Serverless from 1.2.0 to 1.3.0
      • when new knative-serving pods (activator, autoscaler, etc.) start to pop up the knative service is scaled down to 1 and then scaled back to 7 pods
      • when the upgrade finishes, run for 3 more minutes before stopping the test
      • no requests are failed during the upgrade but I expect that the throughput will go down because of the scaling

      Output of the test which shows the times of various events (start of the test, timestamps of scaling up and down:

      === RUN   TestScaleRevisionByLoad
      === RUN   TestScaleRevisionByLoad/clients-040
      2020/01/09 05:25:40 Storing output in /tmp/tmp.pMot0YqncS/junit_TestScaleRevisionByLoad.xml
      --- PASS: TestScaleRevisionByLoad (439.64s)
          --- PASS: TestScaleRevisionByLoad/clients-040 (439.64s)
              scale_revision_by_load_test.go:95: Creating a new Service
              service.go:99: Creating a new Service scale-revision-by-load-clients-040-wzzvujwu.
              crd.go:36: resource {<nil> <nil> <*>{&TypeMeta{Kind:,APIVersion:,} &ObjectMeta{Name:scale-revision-by-load-clients-040-wzzvujwu,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,ManagedFields:[],} {0 <nil> <nil> <nil> <nil> {0 <nil> <nil> <*>&ObjectMeta{Name:,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{autoscaling.knative.dev/target: 10,},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,ManagedFields:[],}} {0 [{ {   <*>true <*>100 <nil>}}]}} {{0 <nil>} {<nil>   <nil> <nil>} { }}} <nil>}
              service.go:114: Waiting for Service "scale-revision-by-load-clients-040-wzzvujwu" to transition to Ready.
              service.go:119: Checking to ensure Service Status is populated for Ready service scale-revision-by-load-clients-040-wzzvujwu
              service.go:125: Getting latest objects Created by Service scale-revision-by-load-clients-040-wzzvujwu
              service.go:128: Successfully created Service scale-revision-by-load-clients-040-wzzvujwu
              scale_revision_by_load_test.go:117: Starting to probe http://scale-revision-by-load-clients-040-wzzvujwu.serving-tests.apps.ocf-rollup-16-rolling-upgrade-under-load.openshift-aws.rhocf-dev.com?timeout=10 at 2020-01-09 05:18:34.551150851 -0500 EST m=+14.117721490
              spoof.go:114: Spoofing scale-revision-by-load-clients-040-wzzvujwu.serving-tests.apps.ocf-rollup-16-rolling-upgrade-under-load.openshift-aws.rhocf-dev.com -> scale-revision-by-load-clients-040-wzzvujwu.serving-tests.apps.ocf-rollup-16-rolling-upgrade-under-load.openshift-aws.rhocf-dev.com
              scale_revision_by_load_test.go:128: Took 3.461312641s for the endpoint to start serving
              scale_revision_by_load_test.go:180: Starting test with 40 clients at 2020-01-09 05:18:38.713395157 -0500 EST m=+18.279965762
              scale_revision_by_load_test.go:192: Starting a new iteration at 2020-01-09 05:19:38.898675095 -0500 EST m=+78.465245726
              scale_revision_by_load_test.go:192: Starting a new iteration at 2020-01-09 05:20:39.088226312 -0500 EST m=+138.654796987
              scale_revision_by_load_test.go:192: Starting a new iteration at 2020-01-09 05:21:39.276378426 -0500 EST m=+198.842949088
              scale_revision_by_load_test.go:192: Starting a new iteration at 2020-01-09 05:22:39.471684501 -0500 EST m=+259.038255138
              scale_revision_by_load_test.go:192: Starting a new iteration at 2020-01-09 05:23:39.655325415 -0500 EST m=+319.221896078
              scale_revision_by_load_test.go:192: Starting a new iteration at 2020-01-09 05:24:39.840745196 -0500 EST m=+379.407315833
              scale_revision_by_load_test.go:189: Stopping the test at 2020-01-09 05:25:40.028359281 -0500 EST m=+439.594929937
              scale_revision_by_load_test.go:210: Scaled: 1 -> 2 in 12.558586128s
              scale_revision_by_load_test.go:210: Scaled: 1 -> 2 in 12.561343781s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 3 in 12.670789911s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 3 in 12.672826576s
              scale_revision_by_load_test.go:210: Scaled: 3 -> 4 in 13.40379106s
              scale_revision_by_load_test.go:210: Scaled: 3 -> 4 in 13.408014952s
              scale_revision_by_load_test.go:210: Scaled: 4 -> 5 in 13.544915938s
              scale_revision_by_load_test.go:210: Scaled: 4 -> 5 in 13.547616735s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 15.163705164s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 15.170309788s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 7 in 55.346693046s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 7 in 55.349204698s
              scale_revision_by_load_test.go:210: Scaled: 7 -> 6 in 1m45.805444646s
              scale_revision_by_load_test.go:210: Scaled: 7 -> 6 in 1m45.809223756s
              scale_revision_by_load_test.go:210: Scaled: 1 -> 2 in 2m54.568937967s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 5 in 3m2.956364851s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 5 in 3m2.961823921s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 3m11.616863299s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 3m11.619038509s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 5 in 3m11.699399727s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 5 in 3m11.747178964s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 3m11.957398847s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 3m11.960317139s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 5 in 3m11.999072661s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 5 in 3m12.004803719s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 5 in 3m12.462871646s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 2 in 3m12.477462052s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 5 in 3m12.598433158s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 2 in 3m12.648084754s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 5 in 3m12.8665464s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 2 in 3m12.885652716s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 5 in 3m13.1581027s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 3m13.308592094s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 3m13.326905387s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 5 in 3m13.359703133s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 5 in 3m13.363420512s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 2 in 3m13.506126698s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 5 in 3m13.699722293s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 2 in 3m13.757708273s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 5 in 3m13.957456228s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 2 in 3m14.094412904s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 5 in 3m14.169924887s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 2 in 3m14.530633055s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 4 in 3m14.945161019s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 4 in 3m14.947050667s
              scale_revision_by_load_test.go:210: Scaled: 2 -> 3 in 3m19.242196563s
              scale_revision_by_load_test.go:210: Scaled: 4 -> 5 in 3m25.935264403s
              scale_revision_by_load_test.go:210: Scaled: 4 -> 5 in 3m25.937681967s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 4 in 3m25.987382028s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 4 in 3m25.989474604s
              scale_revision_by_load_test.go:210: Scaled: 4 -> 5 in 3m26.388550189s
              scale_revision_by_load_test.go:210: Scaled: 4 -> 5 in 3m26.392052692s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 4 in 3m26.448952067s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 4 in 3m26.459582207s
              scale_revision_by_load_test.go:210: Scaled: 4 -> 5 in 3m26.481135427s
              scale_revision_by_load_test.go:210: Scaled: 4 -> 5 in 3m26.483320437s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 3m35.233316325s
              scale_revision_by_load_test.go:210: Scaled: 5 -> 6 in 3m35.236036183s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 7 in 3m35.767769719s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 7 in 3m35.770105726s
              scale_revision_by_load_test.go:210: Scaled: 7 -> 6 in 3m35.808300252s
              scale_revision_by_load_test.go:210: Scaled: 7 -> 6 in 3m35.841738281s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 7 in 3m36.884020367s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 7 in 3m36.886222106s
              scale_revision_by_load_test.go:210: Scaled: 7 -> 6 in 3m36.932124777s
              scale_revision_by_load_test.go:210: Scaled: 7 -> 6 in 3m36.958364227s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 7 in 3m38.128178013s
              scale_revision_by_load_test.go:210: Scaled: 6 -> 7 in 3m38.130276128s
              scale_revision_by_load_test.go:210: Scaled: 3 -> 4 in 3m56.851517425s
              scale_revision_by_load_test.go:210: Scaled: 7 -> 6 in 4m28.479276794s
              scale_revision_by_load_test.go:210: Scaled: 7 -> 6 in 4m28.481205415s
      PASS
      ok  	knative.dev/serving/test/performance	440.731s
      

      Additional timestamps:

      Installplan approved at Thu Jan  9 05:20:26 EST 2020
      New CSV ready at        Thu Jan  9 05:21:11 EST 2020
      Upgrade finished at     Thu Jan  9 05:21:51 EST 2020
      

        1. pod_events.txt
          13 kB
          Martin Gencur
        2. autoscale_pods_during_upgrade.txt
          47 kB
          Martin Gencur

              Unassigned Unassigned
              mgencur Martin Gencur (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: