Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Obsolete
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: quay
Labels:
- triaged

Blocked:
False
Ready:
False
Product:
Quay Enterprise
Release Note Text:
Undefined

RICE Score:
0

SFDC Cases Links:
SFDC Cases Counter:

Description

Issue: Customer has two buckets default(emea) and apacstorage, the apacstorage needs to be decommissioned. However util/backfillreplication.py is unable queue jobs for replication and it get killed after a while

sh-4.2$ scl enable python27 bash
bash-4.2$ python -m util.backfillreplication
No handlers could be found for logger "util.config.provider.baseprovider"
data/secscan_model/__init__.py:22: DeprecationWarning: Call to deprecated class V2SecurityScanner. (Will be replaced by a V4 API security scanner soon)
 self._model = V2SecurityScanner(app, instance_keys, storage)
Enqueueing image storage fa0dee00-87e1-4148-8a71-4ccc59cde327 to be replicated
Enqueueing image storage e6c60f41-0aa5-4c59-9044-f67311d194a9 to be replicated
Enqueueing image storage 23e199b2-1eb5-4e74-9bfc-675c571d4bb2 to be replicated
Enqueueing image storage 9bc4c59a-646c-4180-8f8b-f605d87a4fd2 to be replicated
Killed

So we added more steps and we could see it fails at different repositories everytime and it tries to replicate layers from non existent storage buckets

 image vehicle-service-admin-center-gateway
 namespace %s 461
 locations %s set([])
 default location ['default', 'apacstorage']
 locations_required %s set(['default', 'apacstorage'])
 existing_locations set([u'default', u'europestorage', u'ndcstorage', u'cloudian', u'apacstorage'])
 locations_missing################# set([])
image legal-structure
 Killed

So root cause of the problem, Customer added/removed multiple storage in the past and some of the blobs were not copied over correctly and images are still pointing to older S3 buckets

qedb=# select * from imagestoragelocation;
 id | name
 ---+------------------
 1 | s3_us_east_1
 2 | s3_eu_west_1
 3 | s3_ap_southeast_1
 4 | s3_ap_southeast_2
 5 | s3_ap_northeast_1
 6 | s3_sa_east_1
 7 | local
 8 | s3_us_west_1
 9 | default
 10 | apacstorage
 11 | europestorage
 44 | ndcstorage
 45 | cloudian
 46 | oldbuck
 47 | ecs
 (15 rows)

following number of records are present from removed buckets in DB .

#Cloudian
select count(*) from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=45;
 count
--------
 443233
(1 row)
#europestorage
qedb=# select count(*) from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=11;
 count
-------
 290
qedb=# select count(*) from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=10;
 count
---------
 1082259
(1 row)
qedb=# select count(*) from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=44;
 count
--------
 291433
(1 row)

Expectation: We need a way to migrate data in apacstorage to default(emea) and ensure Quay is able to point to default for all layers which previously existed in apacstorage.

Suggested recovery steps:

*Remove all storage ID present in location_id in *11,44,45,46,47
Recursive copy all data from apacstorage to emea storage
Remove apacstorage from config.yaml file and remove georeplication and restart quay with new config
Delete duplicate storage_id which has location_id as apacstorage but present in both emea and APAC.

 delete from imagestorageplacement where id IN (select id from (select id from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=10) AS C);

now update location_id for all storage_id which has location_id set as apacstorage to default(emea)

update imagestorageplacement SET location_id=9 where location_id=10;

Ensure all storage_id are pointing to location_id default

We need engineering to verify if the above steps are fine to deprecate one of the storages and complete storage migration**

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Dixit Gangaiah (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2020/09/18 5:15 AM

Updated:: 2023/12/15 9:24 PM

Resolved:: 2022/07/06 4:17 PM