Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-10257

[Worker] Implement Architecture-Filtered Mirroring

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • Security & Compliance
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected

      [Worker] Implement Architecture-Filtered Mirroring

      Summary

      Update the repository mirror worker to filter manifests by configured architectures. This modifies the skopeo integration to copy only selected architectures instead of using the --all flag, while preserving the original manifest list digest for OpenShift compatibility.

      Acceptance Criteria

      • [ ] Mirror worker respects architecture_filter configuration from RepoMirrorConfig
      • [ ] When architecture filter is set, only specified architectures are copied
      • [ ] When architecture filter is empty/null, all architectures are copied (backwards compatible)
      • [ ] Original manifest list digest is preserved when uploading to Quay
      • [ ] Individual architecture manifests are copied for each filtered architecture
      • [ ] Mirror sync logs indicate which architectures were mirrored
      • [ ] Failed architecture copies are handled gracefully with proper error reporting
      • [ ] Existing mirrors without architecture config continue to work unchanged

      Technical Requirements

      Mirror Worker Changes

      File: workers/repomirrorworker/__init__.py

      Modify the mirror execution flow to support architecture filtering:

      1. Before mirroring a tag, check for multi-arch images:
        def perform_mirror(skopeo, mirror):
        # ... existing setup ...
        
        
        architecture_filter = mirror.architecture_filter or []
        
        
        for tag in tags_to_mirror(skopeo, mirror):
            if architecture_filter:
                # Use architecture-filtered copy
                copy_filtered_architectures(skopeo, mirror, tag, architecture_filter)
            else:
                # Use existing --all copy behavior
                skopeo.copy(src_image, dest_image, all_tags=True)
        
        1. New function for filtered architecture copying:
          ```python
          def copy_filtered_architectures(skopeo, mirror, tag, architectures):
          """
          Copy only specified architectures from a multi-arch image.
        2. Fetch manifest list from source
        3. Parse and identify matching architecture manifests
        4. Copy each matching architecture manifest individually
        5. Push the original manifest list (sparse) to destination
          """
          src_image = get_source_image_ref(mirror, tag)
          dest_image = get_dest_image_ref(mirror, tag)

          Step 1: Inspect manifest list

          manifest_list = skopeo.inspect_manifest(src_image)

          Step 2: Filter to requested architectures

          matching_manifests = filter_manifests_by_arch(manifest_list, architectures)

          Step 3: Copy each architecture manifest

          for manifest_ref in matching_manifests:
          skopeo.copy_by_digest(src_image, dest_image, manifest_ref.digest)

          Step 4: Push original manifest list (sparse)

          push_sparse_manifest_list(dest_image, manifest_list.raw_bytes)
          ```

      Skopeo Integration Changes

      File: util/repomirror/skopeomirror.py

      Add new methods to support filtered copying:

      def inspect_manifest(self, image_ref):
          """
          Inspect and return the manifest for an image reference.
      
          Uses: skopeo inspect --raw docker://image
          Returns: Parsed manifest (list or single) with raw bytes
          """
          args = ["/usr/bin/skopeo", "inspect", "--raw"]
          args.append(f"docker://{image_ref}")
      
          result = self.run_skopeo(args)
          if not result.success:
              raise RepoMirrorSkopeoException(f"Failed to inspect {image_ref}")
      
          return ManifestInspectResult(
              raw_bytes=result.stdout,
              parsed=json.loads(result.stdout),
              media_type=detect_manifest_type(result.stdout)
          )
      
      def copy_by_digest(self, src_image, dest_image, digest):
          """
          Copy a specific manifest by digest.
      
          Uses: skopeo copy docker://image@sha256:xxx docker://dest
          """
          args = ["/usr/bin/skopeo", "copy", "--remove-signatures"]
          args += self._get_auth_args()
          args.append(f"docker://{src_image}@{digest}")
          args.append(f"docker://{dest_image}")
      
          result = self.run_skopeo(args)
          return result.success
      

      Manifest List Parsing

      File: workers/repomirrorworker/manifest_utils.py (new file)

      Create utility functions for parsing manifest lists:

      def parse_manifest_list(manifest_bytes):
          """
          Parse manifest list/index and extract architecture information.
      
          Handles both Docker Manifest List and OCI Index formats.
          """
          data = json.loads(manifest_bytes)
      
          if data.get("mediaType") == DOCKER_SCHEMA2_MANIFESTLIST_CONTENT_TYPE:
              return parse_docker_manifest_list(data)
          elif data.get("mediaType") == OCI_IMAGE_INDEX_CONTENT_TYPE:
              return parse_oci_index(data)
          else:
              # Single architecture manifest
              return None
      
      def filter_manifests_by_arch(manifest_list, architectures):
          """
          Filter manifest list entries to only include specified architectures.
      
          Args:
              manifest_list: Parsed manifest list
              architectures: List of architecture strings to include
      
          Returns:
              List of ManifestReference for matching architectures
          """
          matching = []
          for entry in manifest_list.manifests:
              platform = entry.get("platform", {})
              arch = platform.get("architecture")
              if arch in architectures:
                  matching.append(ManifestReference(
                      digest=entry["digest"],
                      size=entry["size"],
                      architecture=arch,
                      os=platform.get("os", "linux")
                  ))
          return matching
      
      def is_manifest_list(manifest_bytes):
          """Check if manifest bytes represent a manifest list/index."""
          try:
              data = json.loads(manifest_bytes)
              media_type = data.get("mediaType", "")
              return media_type in [
                  DOCKER_SCHEMA2_MANIFESTLIST_CONTENT_TYPE,
                  OCI_IMAGE_INDEX_CONTENT_TYPE
              ]
          except:
              return False
      

      Sparse Manifest List Upload

      After copying individual architecture manifests, push the original manifest list:

      def push_sparse_manifest_list(skopeo, dest_image, tag, manifest_list_bytes, media_type):
          """
          Push the original manifest list to the destination registry.
      
          This preserves the original digest by uploading the exact bytes.
          The destination registry (Quay) must have FEATURE_SPARSE_INDEX enabled.
          """
          # Use Quay's v2 API to push manifest directly
          # This requires the robot credentials from the mirror config
      
          digest = compute_digest(manifest_list_bytes)
      
          response = requests.put(
              f"{dest_registry}/v2/{repo}/manifests/{tag}",
              data=manifest_list_bytes,
              headers={
                  "Content-Type": media_type,
                  "Authorization": f"Bearer {token}"
              }
          )
      
          if response.status_code not in (201, 202):
              raise RepoMirrorException(f"Failed to push manifest list: {response.text}")
      
          return digest
      

      Implementation Notes

      Existing Patterns to Follow

      • Worker flow: See perform_mirror() in workers/repomirrorworker/__init__.py
      • Skopeo execution: See run_skopeo() in util/repomirror/skopeomirror.py
      • Error handling: Follow RepoMirrorSkopeoException pattern

      Key Behaviors

      1. Single architecture images: Copied normally (no manifest list to parse)
      2. Multi-arch with filter: Individual manifests copied, then sparse manifest list pushed
      3. Multi-arch without filter: Use existing --all behavior for backwards compatibility
      4. Missing architecture in source: Log warning but continue (don't fail entire sync)

      Logging

      Add detailed logging for architecture filtering:

      logger.info(
          "Mirroring architectures for %s:%s: %s",
          mirror.external_reference, tag, architectures
      )
      
      logger.info(
          "Copied architecture %s (%s) for %s:%s",
          arch, digest, mirror.external_reference, tag
      )
      
      logger.warning(
          "Architecture %s not found in source for %s:%s",
          arch, mirror.external_reference, tag
      )
      

      Transaction Safety

      Ensure atomic behavior:
      1. Copy all architecture manifests first
      2. Only push manifest list after all architectures copied
      3. If any step fails, mark sync as failed with details
      4. Next sync will retry the entire tag

      Dependencies

      • Story 01: Database schema for architecture_filter field
      • Story 02: Registry core must accept sparse manifest lists

      Testing Requirements

      Unit Tests

      File: workers/test/test_repomirrorworker.py (extend)

      def test_mirror_with_architecture_filter():
          """Test mirroring only specified architectures."""
      
      def test_mirror_without_architecture_filter():
          """Test mirroring all architectures when no filter set."""
      
      def test_mirror_single_arch_image_with_filter():
          """Test single architecture image is copied normally."""
      
      def test_mirror_missing_architecture_in_source():
          """Test graceful handling when filtered arch not in source."""
      
      def test_manifest_list_parsing():
          """Test parsing Docker and OCI manifest list formats."""
      
      def test_filter_manifests_by_arch():
          """Test filtering manifest entries by architecture."""
      

      Integration Tests

      Test with actual registry interactions:
      1. Set up mirror with architecture filter
      2. Trigger mirror sync
      3. Verify only filtered architectures are stored in destination
      4. Verify manifest list digest matches source

      Definition of Done

      • [ ] Code implemented and follows project conventions
      • [ ] All acceptance criteria met
      • [ ] Unit tests written and passing
      • [ ] Integration tests written and passing
      • [ ] No regressions in existing mirror functionality
      • [ ] Logging added for architecture filtering operations
      • [ ] Code reviewed and approved

              marckok Marcus Kok
              marckok Marcus Kok
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: