Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-2243

Operator stuck during upgrade from 3.5.1 to 3.5.3

    XMLWordPrintable

Details

    • False
    • False
    • Quay Enterprise
    • Undefined
    • 0

    Description

      The operator is stuck in "Running database migrations" part:

      Name:         sfit-registry
      Namespace:    sfit-quay
      Labels:       <none>
      Annotations:  <none>
      API Version:  quay.redhat.com/v1
      Kind:         QuayRegistry
      ...
      Status:
        Conditions:
          Last Transition Time:            2021-06-10T14:47:14Z
          Last Update Time:                2021-06-10T14:47:14Z
          Message:                         running database migrations
          Reason:                          MigrationsInProgress
          Status:                          False
          Type:                            Available
      

      We saw that the upgrade pod was never rescaled back so we manually scaled it to 1, we hoped that that would resolve the lock. Unfortunately, because the upgrade pod runs all workers, it has gone onto a loop scanning images (we've seen this behaviour with another client):

      namespacegcworker stdout | 2021-07-13 11:25:00,257 [93] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2021-07-13 11:26:00 UTC)" (scheduled at 2021-07-13 11:25:00.257427+00:00)
      namespacegcworker stdout | 2021-07-13 11:25:00,301 [93] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2021-07-13 11:26:00 UTC)" executed successfully
      securityscanningnotificationworker stdout | 2021-07-13 11:25:03,359 [108] [INFO] [apscheduler.executors.default] Running job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2021-07-13 11:26:03 UTC)" (scheduled at 2021-07-13 11:25:03.358899+00:00)
      securityscanningnotificationworker stdout | 2021-07-13 11:25:03,359 [108] [INFO] [apscheduler.executors.default] Job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2021-07-13 11:26:03 UTC)" executed successfully
      notificationworker stdout | 2021-07-13 11:25:06,068 [95] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2021-07-13 11:25:16 UTC)" (scheduled at 2021-07-13 11:25:06.067696+00:00)
      notificationworker stdout | 2021-07-13 11:25:06,109 [95] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:00:10], next run at: 2021-07-13 11:25:16 UTC)" executed successfully
      securityworker stdout | 2021-07-13 11:25:06,464 [109] [INFO] [apscheduler.executors.default] Running job "SecurityWorker._index_in_scanner (trigger: interval[0:00:30], next run at: 2021-07-13 11:25:36 UTC)" (scheduled at 2021-07-13 11:25:06.463601+00:00)
      securityworker stdout | 2021-07-13 11:25:06,534 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 150-186
      securityworker stdout | 2021-07-13 11:25:06,537 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 247-283
      securityworker stdout | 2021-07-13 11:25:06,540 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 192-228
      securityworker stdout | 2021-07-13 11:25:06,543 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 186-222
      securityworker stdout | 2021-07-13 11:25:06,545 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 228-264
      securityworker stdout | 2021-07-13 11:25:06,554 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 336-372
      gcworker stdout | 2021-07-13 11:25:06,559 [84] [INFO] [apscheduler.executors.default] Running job "GarbageCollectionWorker._garbage_collection_repos (trigger: interval[0:00:30], next run at: 2021-07-13 11:25:36 UTC)" (scheduled at 2021-07-13 11:25:06.558679+00:00)
      securityworker stdout | 2021-07-13 11:25:06,565 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 289-325
      securityworker stdout | 2021-07-13 11:25:06,568 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 325-361
      securityworker stdout | 2021-07-13 11:25:06,570 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 283-319
      securityworker stdout | 2021-07-13 11:25:06,573 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 71-107
      securityworker stdout | 2021-07-13 11:25:06,585 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 108-144
      securityworker stdout | 2021-07-13 11:25:06,588 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 28-64
      securityworker stdout | 2021-07-13 11:25:06,590 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 144-180
      securityworker stdout | 2021-07-13 11:25:06,593 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 107-143
      securityworker stdout | 2021-07-13 11:25:06,596 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 3-39
      securityworker stdout | 2021-07-13 11:25:06,598 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 372-393
      securityworker stdout | 2021-07-13 11:25:06,601 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 64-100
      securityworker stdout | 2021-07-13 11:25:06,601 [109] [INFO] [util.migrate.allocator] No more work
      securityworker stdout | 2021-07-13 11:25:06,604 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 239-275
      securityworker stdout | 2021-07-13 11:25:06,607 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 104-140
      securityworker stdout | 2021-07-13 11:25:06,610 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 166-202
      securityworker stdout | 2021-07-13 11:25:06,613 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 202-238
      securityworker stdout | 2021-07-13 11:25:06,616 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 30-66
      securityworker stdout | 2021-07-13 11:25:06,618 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 238-274
      securityworker stdout | 2021-07-13 11:25:06,621 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 67-103
      securityworker stdout | 2021-07-13 11:25:06,624 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 3-39
      securityworker stdout | 2021-07-13 11:25:06,628 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 140-176
      securityworker stdout | 2021-07-13 11:25:06,631 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 66-102
      securityworker stdout | 2021-07-13 11:25:06,634 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 324-360
      securityworker stdout | 2021-07-13 11:25:06,637 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 288-324
      securityworker stdout | 2021-07-13 11:25:06,640 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 103-139
      securityworker stdout | 2021-07-13 11:25:06,643 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 360-393
      securityworker stdout | 2021-07-13 11:25:06,646 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 275-311
      securityworker stdout | 2021-07-13 11:25:06,646 [109] [INFO] [util.migrate.allocator] No more work
      securityworker stdout | 2021-07-13 11:25:06,650 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 242-278
      gcworker stdout | 2021-07-13 11:25:06,650 [84] [INFO] [apscheduler.executors.default] Job "GarbageCollectionWorker._garbage_collection_repos (trigger: interval[0:00:30], next run at: 2021-07-13 11:25:36 UTC)" executed successfully
      securityworker stdout | 2021-07-13 11:25:06,652 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 284-320
      securityworker stdout | 2021-07-13 11:25:06,655 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 278-314
      securityworker stdout | 2021-07-13 11:25:06,658 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 106-142
      securityworker stdout | 2021-07-13 11:25:06,661 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 6-42
      securityworker stdout | 2021-07-13 11:25:06,664 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 3-39
      securityworker stdout | 2021-07-13 11:25:06,700 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 155-191
      securityworker stdout | 2021-07-13 11:25:06,707 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 52-88
      securityworker stdout | 2021-07-13 11:25:06,710 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 88-124
      securityworker stdout | 2021-07-13 11:25:06,724 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 42-78
      securityworker stdout | 2021-07-13 11:25:06,727 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 206-242
      securityworker stdout | 2021-07-13 11:25:06,730 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 191-227
      securityworker stdout | 2021-07-13 11:25:06,751 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 142-178
      securityworker stdout | 2021-07-13 11:25:06,755 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 348-384
      securityworker stdout | 2021-07-13 11:25:06,758 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 320-356
      securityworker stdout | 2021-07-13 11:25:06,761 [109] [INFO] [util.migrate.allocator] No candidates, marking entire block completed 384-393
      securityworker stdout | 2021-07-13 11:25:06,761 [109] [INFO] [util.migrate.allocator] No more work
      securityworker stdout | 2021-07-13 11:25:06,761 [109] [INFO] [apscheduler.executors.default] Job "SecurityWorker._index_in_scanner (trigger: interval[0:00:30], next run at: 2021-07-13 11:25:36 UTC)" executed successfully
      buildlogsarchiver stdout | 2021-07-13 11:25:07,256 [79] [INFO] [apscheduler.executors.default] Running job "ArchiveBuildLogsWorker._archive_redis_buildlogs (trigger: interval[0:00:30], next run at: 2021-07-13 11:25:37 UTC)" (scheduled at 2021-07-13 11:25:07.255435+00:00)
      buildlogsarchiver stdout | 2021-07-13 11:25:07,303 [79] [INFO] [apscheduler.executors.default] Job "ArchiveBuildLogsWorker._archive_redis_buildlogs (trigger: interval[0:00:30], next run at: 2021-07-13 11:25:37 UTC)" executed successfully
      exportactionlogsworker stdout | 2021-07-13 11:25:13,363 [83] [INFO] [apscheduler.executors.default] Running job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2021-07-13 11:26:13 UTC)" (scheduled at 2021-07-13 11:25:13.362870+00:00)
      exportactionlogsworker stdout | 2021-07-13 11:25:13,363 [83] [INFO] [apscheduler.executors.default] Job "QueueWorker.run_watchdog (trigger: interval[0:01:00], next run at: 2021-07-13 11:26:13 UTC)" executed successfully
      securityscanningnotificationworker stdout | 2021-07-13 11:25:13,856 [108] [INFO] [apscheduler.executors.default] Running job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2021-07-13 11:26:13 UTC)" (scheduled at 2021-07-13 11:25:13.855426+00:00)
      securityscanningnotificationworker stdout | 2021-07-13 11:25:13,911 [108] [INFO] [apscheduler.executors.default] Job "QueueWorker.poll_queue (trigger: interval[0:01:00], next run at: 2021-07-13 11:26:13 UTC)" executed successfully
      gunicorn-web stdout | 2021-07-13 11:25:14,042 [244] [WARNING] [py.warnings] /usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py:997: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
      gunicorn-web stdout |   warnings.warn(
      nginx stdout | 127.0.0.1 () - - [13/Jul/2021:11:25:14 +0000] "GET /v1/_internal_ping HTTP/1.1" 200 4 "-" "python-requests/2.22.0" (0.002 162 0.001)
      gunicorn-registry stdout | 2021-07-13 11:25:14,045 [240] [INFO] [gunicorn.access] 127.0.0.1 - - [13/Jul/2021:11:25:14 +0000] "GET /v1/_internal_ping HTTP/1.1" 200 4 "-" "python-requests/2.22.0"
      gunicorn-web stdout | 2021-07-13 11:25:14,050 [244] [WARNING] [py.warnings] /usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py:997: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
      gunicorn-web stdout |   warnings.warn(
      nginx stdout | 127.0.0.1 () - - [13/Jul/2021:11:25:14 +0000] "GET /_internal_ping HTTP/1.1" 200 4 "-" "python-requests/2.22.0" (0.002 159 0.001)
      gunicorn-web stdout | 2021-07-13 11:25:14,052 [246] [INFO] [gunicorn.access] 127.0.0.1 - - [13/Jul/2021:11:25:14 +0000] "GET /_internal_ping HTTP/1.0" 200 4 "-" "python-requests/2.22.0"
      gunicorn-web stdout | 2021-07-13 11:25:14,053 [244] [INFO] [data.database] Connection pooling disabled for postgresql
      

      Scaling back the upgrade pod deployment to 0 did nothing, the operator is still stuck in the same message:

      I0714 06:23:37.310435       1 request.go:655] Throttling request took 1.034055947s, request: GET:https://172.19.192.1:443/apis/template.openshift.io/v1?timeout=32s
      2021-07-14T06:23:39.967Z        INFO    controller-runtime.metrics      metrics server is starting to listen    {"addr": ":8080"}
      2021-07-14T06:23:39.971Z        INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
      2021-07-14T06:23:39.971Z        INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
      2021-07-14T06:23:39.971Z        INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
      2021-07-14T06:23:39.971Z        INFO    controller-runtime.injectors-warning    Injectors are deprecated, and will be removed in v0.10.x
      2021-07-14T06:23:39.971Z        INFO    setup   starting server on port 7071
      2021-07-14T06:23:39.971Z        INFO    setup   starting manager
      2021-07-14T06:23:39.971Z        INFO    controller-runtime.manager      starting metrics server {"path": "/metrics"}
      2021-07-14T06:23:39.972Z        INFO    controller-runtime.manager.controller.quayregistry      Starting EventSource    {"reconciler group": "quay.redhat.com", "reconciler kind": "QuayRegistry", "source": "kind source: /, Kind="}
      2021-07-14T06:23:39.972Z        INFO    controller-runtime.manager.controller.quayecosystem     Starting EventSource    {"reconciler group": "redhatcop.redhat.io", "reconciler kind": "QuayEcosystem", "source": "kind source: /, Kind="}
      2021-07-14T06:23:40.072Z        INFO    controller-runtime.manager.controller.quayregistry      Starting Controller     {"reconciler group": "quay.redhat.com", "reconciler kind": "QuayRegistry"}
      2021-07-14T06:23:40.072Z        INFO    controller-runtime.manager.controller.quayregistry      Starting workers        {"reconciler group": "quay.redhat.com", "reconciler kind": "QuayRegistry", "worker count": 1}
      2021-07-14T06:23:40.072Z        INFO    controller-runtime.manager.controller.quayecosystem     Starting Controller     {"reconciler group": "redhatcop.redhat.io", "reconciler kind": "QuayEcosystem"}
      2021-07-14T06:23:40.072Z        INFO    controller-runtime.manager.controller.quayecosystem     Starting workers        {"reconciler group": "redhatcop.redhat.io", "reconciler kind": "QuayEcosystem", "worker count": 1}
      2021-07-14T06:23:40.072Z        INFO    controllers.QuayRegistry        begin reconcile {"quayregistry": "sfit-quay/sfit-registry"}
      2021-07-14T06:23:40.072Z        INFO    controllers.QuayRegistry        migrations in progress, skipping reconcile      {"quayregistry": "sfit-quay/sfit-registry"}
      

      The question from the client is: can we somehow either remove finalizers or make the operator progress manually? There should be no db migrations anyway since it's an upgrade from 3.5.1 to 3.5.3 and there are no db migrations between z-stream versions.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rhn-support-ibazulic Ivan Bazulic
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: