Upgrading quay from 3.3.4 to 3.4.x or 3.6.x won't complete and POD quay-enterprise-quay-postgres-migration-xxxxxx will remain in 1 out of 2 containers in ready status
~~~
quay-enterprise-quay-postgres-migration-xxxxxx 1/2 Running 0 8m6s
status:
conditions:
- lastProbeTime: null
lastTransitionTime: '2021-11-03T16:55:55Z'
status: 'True'
type: Initialized - lastProbeTime: null
lastTransitionTime: '2021-11-03T16:53:37Z'
message: 'containers with unready status: [quay-postgres-migration-cleanup]'
reason: ContainersNotReady
status: 'False'
type: Ready - lastProbeTime: null
lastTransitionTime: '2021-11-03T16:53:37Z'
message: 'containers with unready status: [quay-postgres-migration-cleanup]'
reason: ContainersNotReady
status: 'False'
type: ContainersReady
containerStatuses:
- containerID: cri-o://8e7324eac11af69a911404170c43b805758f5d59efc97ccbc43d07bba16eb756
image: registry.redhat.io/rhel8/postgresql-10@sha256:98ca35fdf08068b49216a35ed4e81507bf91c8babf30c92d5f200cbfb2df35ed
imageID: registry.redhat.io/rhel8/postgresql-10@sha256:6d97b69c1dd606d5ca679bbae3d6e6a8073bb1c561fa8b0d78e37f0bcac84237
lastState: {}
name: quay-postgres-migration
ready: true
restartCount: 0
started: true
state:
running:
startedAt: '2021-11-03T16:55:55Z' - containerID: cri-o://67d375f710732a4f9afc4d35832cd9094fb08130d40a4a8450f590c668a72ccc
image: registry.redhat.io/rhel8/postgresql-10@sha256:98ca35fdf08068b49216a35ed4e81507bf91c8babf30c92d5f200cbfb2df35ed
imageID: registry.redhat.io/rhel8/postgresql-10@sha256:6d97b69c1dd606d5ca679bbae3d6e6a8073bb1c561fa8b0d78e37f0bcac84237
lastState: {}
name: quay-postgres-migration-cleanup
ready: false
restartCount: 0
started: true
state:
running:
startedAt: '2021-11-03T16:55:55Z'
~~~
cleanup container, in the postgres migration POD quay-enterprise-quay-postgres-migration, starts at the same time as the container quay-postgres-migration.
The command which runs on the cleanup container is:
~~~
- command:
- /bin/bash
- -c
- sleep 20; rm -f /tmp/change-username.sql /tmp/check-user.sql; echo "ALTER ROLE
\"$OLD_DB_USERNAME\" RENAME TO \"$NEW_DB_USERNAME\"; ALTER DATABASE \"$OLD_DB_NAME\"
RENAME TO \"$NEW_DB_NAME\";" > /tmp/change-username.sql; echo "SELECT 1 FROM
pg_roles WHERE rolname = '$NEW_DB_USERNAME';" > /tmp/check-user.sql; psql -h
localhost -f /tmp/check-user.sql | grep -q 1 || psql -h localhost -f /tmp/change-username.sql;
sleep 600;
~~~
In this case, postgres did not start within the first 20 seconds in container `quay-postgres-migration` and the cleanup container throws:
~~~
psql: could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
psql: could not connect to server: Connection refuse 0/1 and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
~~~
In container quay-postgres-migration, we see the DB starts later, but the script from cleanup container waits 600 seconds to start over.
User stated that waiting 10 minutes won't solve the issue and the cleanup container remains in loop hung and the upgrade process won't complete.
As a workaround, one can manually access the cleanup container and run the psql scripts, this makes the container to change the DB ownership and the upgrade process to complete.
Another workaround is to wait for the container quay-postgres-migration to start the DB and then access the node where the POD is scheduled onto and stop the cleanup container to let the POD start it back after the DB is up, this also worked and the upgrade procedure completed