Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-6578

Race condition during initial database setup

XMLWordPrintable

    • False
    • None
    • False
    • Quay Enterprise
    • 0

      When QuayRegistry is being deployed, the operator immediately creates both the app upgrade job and the database deployment. In certain cases, the job will start populating the database before database reports as ready and will be killed mid migration by kubelet and restarted. Since the db is partially populated, subsequent migrations will fail. Example:

      ~/openshift-4/quay-reproducer# oc get pods
      NAME                                   READY   STATUS                  RESTARTS       AGE
      quay-clair-postgres-84999868bb-ll99c   1/1     Running                 0              4m38s
      quay-quay-app-upgrade-85x9x            0/1     CrashLoopBackOff        5 (90s ago)    4m39s
      quay-quay-database-6fcd4c4b5b-w7sw2    1/1     Running                 0              4m37s
      quay-quay-mirror-769458bbd-n5znm       0/1     Init:CrashLoopBackOff   5 (107s ago)   4m38s
      quay-quay-mirror-769458bbd-tkc8g       0/1     Init:CrashLoopBackOff   5 (111s ago)   4m38s
      quay-quay-redis-7f58874b5d-lk8qr       1/1     Running                 0              4m39s
      
      ~/openshift-4/quay-reproducer# oc logs quay-qpay-app-upgrade-85x9x
      ...
      Entering migration mode to version: head
      21:00:18 INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
      21:00:18 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
      21:00:18 INFO  [alembic.runtime.migration] Running upgrade e2894a3a3c19 -> 7a525c68eb13, Add OCI/App models.
      Traceback (most recent call last):
        File "/app/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
          self.dialect.do_execute(
        File "/app/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
          cursor.execute(statement, parameters)
      psycopg2.errors.DuplicateTable: relation "tagkind" already exists
      
      
      The above exception was the direct cause of the following exception:
      
      Traceback (most recent call last):
        File "/app/bin/alembic", line 8, in <module>
          sys.exit(main())
      ...
        File "/app/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
          raise exception
        File "/app/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
          self.dialect.do_execute(
        File "/app/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
          cursor.execute(statement, parameters)
      sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "tagkind" already exists
      
      [SQL:
      CREATE TABLE tagkind (
              id SERIAL NOT NULL,
              name VARCHAR(255) NOT NULL,
              CONSTRAINT pk_tagkind PRIMARY KEY (id)
      )
      
      ]
      (Background on this error at: https://sqlalche.me/e/14/f405)
      

      The only recourse here is to restart the whole procedure again.
      The expectation is that the migration job is not started until database reports as ready. Only then should the job be created by the operator.

            Unassigned Unassigned
            rhn-support-ibazulic Ivan Bazulic
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: