Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-1097

Backfill migration fails

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Critical
    • None
    • None
    • quay
    • False
    • False
    • Quay Enterprise
    • Undefined
    • 0

    Description

      Issue: Customer has two buckets default(emea) and  apacstorage, the apacstorage needs to be decommissioned. However util/backfillreplication.py is unable queue jobs for replication and it get killed after a while

       

      sh-4.2$ scl enable python27 bash
      bash-4.2$ python -m util.backfillreplication
      No handlers could be found for logger "util.config.provider.baseprovider"
      data/secscan_model/__init__.py:22: DeprecationWarning: Call to deprecated class V2SecurityScanner. (Will be replaced by a V4 API security scanner soon)
       self._model = V2SecurityScanner(app, instance_keys, storage)
      Enqueueing image storage fa0dee00-87e1-4148-8a71-4ccc59cde327 to be replicated
      Enqueueing image storage e6c60f41-0aa5-4c59-9044-f67311d194a9 to be replicated
      Enqueueing image storage 23e199b2-1eb5-4e74-9bfc-675c571d4bb2 to be replicated
      Enqueueing image storage 9bc4c59a-646c-4180-8f8b-f605d87a4fd2 to be replicated
      Killed 

       

       So we added more steps and we could see it fails at different repositories everytime and it tries to replicate layers from non existent storage buckets

       image vehicle-service-admin-center-gateway
       namespace %s 461
       locations %s set([])
       default location ['default', 'apacstorage']
       locations_required %s set(['default', 'apacstorage'])
       existing_locations set([u'default', u'europestorage', u'ndcstorage', u'cloudian', u'apacstorage'])
       locations_missing################# set([])
      image legal-structure
       Killed

      So root cause of the problem, Customer added/removed multiple storage in the past and some of the blobs were not copied over correctly and images are still pointing to older S3 buckets

      • qedb=# select * from imagestoragelocation;
         id | name
         ---+------------------
         1 | s3_us_east_1
         2 | s3_eu_west_1
         3 | s3_ap_southeast_1
         4 | s3_ap_southeast_2
         5 | s3_ap_northeast_1
         6 | s3_sa_east_1
         7 | local
         8 | s3_us_west_1
         9 | default
         10 | apacstorage
         11 | europestorage
         44 | ndcstorage
         45 | cloudian
         46 | oldbuck
         47 | ecs
         (15 rows)
          

           following number of records are present from removed buckets in DB .

       

      #Cloudian
      select count(*) from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=45;
       count
      --------
       443233
      (1 row)
      #europestorage
      qedb=# select count(*) from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=11;
       count
      -------
       290
      qedb=# select count(*) from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=10;
       count
      ---------
       1082259
      (1 row)
      qedb=# select count(*) from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=44;
       count
      --------
       291433
      (1 row) 
      

      Expectation: We need a way to migrate data in apacstorage to default(emea) and ensure Quay is able to point to default for all layers which previously existed in apacstorage.

       

      Suggested recovery steps:

      • *Remove all storage ID present in location_id in *11,44,45,46,47 
      • Recursive copy all data from apacstorage to emea storage
      • Remove apacstorage from config.yaml file and remove georeplication and restart quay with new config
      • Delete duplicate storage_id which has location_id as apacstorage but present in  both emea and APAC.
       delete from imagestorageplacement where id IN (select id from (select id from imagestorageplacement where storage_id in (select storage_id from imagestorageplacement group by storage_id having count(*)>1) and location_id=10) AS C);
       
      
      •  now update location_id for all storage_id which has location_id set as apacstorage to default(emea)  
      update imagestorageplacement SET location_id=9 where location_id=10;
      
      • Ensure all storage_id are pointing to location_id default

       

       We need engineering to verify if the above steps are fine to deprecate one of the storages and complete storage migration**

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            rhn-support-dgangaia Dixit Gangaiah (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: