Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-10826

Quay 3.17 Immutability policy rollback failure leaves tags in inconsistent state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • quay-v3.17.0
    • quay
    • False
    • Hide

      None

      Show
      None
    • False

      Overview

      When creating an immutability policy, if the retroactive application process fails partway through (database timeout, connection loss, etc.), the rollback mechanism fails to restore a consistent state. This leaves some tags marked as immutable while others remain mutable, and the policy itself may be deleted, creating orphaned immutable tags.

      Discovery: Code review of PROJQUAY-10157 implementation (2026-03-04)
      Severity: MAJOR (Data Consistency Issue)
      Impact: Incomplete retroactive application creates inconsistent immutability state

      Technical Details

      Vulnerable Code Location

      File: data/model/immutability.py:269-311

      Function: create_namespace_immutability_policy()

      Code showing the issue:

      def create_namespace_immutability_policy(orgname: str, policy_config: PolicyConfig):
          _validate_policy(policy_config)
          with db_transaction():
              namespace = get_active_namespace_user_by_username(orgname)
              
              if _is_duplicate_namespace_policy(namespace.id, policy_config):
                  raise DuplicateImmutabilityPolicy(...)
              
              # Policy created and committed to database
              policy = NamespaceImmutabilityPolicyTable.create(
                  namespace=namespace.id, policy=policy_config
              )
              namespace_id = namespace.id
              policy_uuid = policy.uuid
          # Transaction commits here - policy is now permanent
          
          # Retroactive application runs OUTSIDE transaction
          # If this fails, policy rollback is incomplete
          try:
              apply_immutability_policy_to_existing_tags(
                  namespace_id=namespace_id,
                  repository_id=None,
                  tag_pattern=policy_config["tag_pattern"],
                  tag_pattern_matches=policy_config.get("tag_pattern_matches", True),
              )
          except Exception:
              logger.exception("Failed to retroactively apply policy")
              # Attempts to delete policy for rollback
              # BUT: Some tags may already be marked immutable
              try:
                  NamespaceImmutabilityPolicyTable.delete().where(
                      NamespaceImmutabilityPolicyTable.uuid == policy_uuid
                  ).execute()
              except Exception:
                  logger.exception("Failed to rollback policy creation")
              raise
      

      Repository-Level Policy Has Same Issue

      File: data/model/immutability.py:409-470

      Function: create_repository_immutability_policy()

      Identical rollback failure issue exists for repository-level policies.

      Problem Scenario

      Failure During Retroactive Application

      Step-by-step failure scenario:

      Time       Event                                           State
      ----------------------------------------------------------------------------------
      10:00.000  User creates policy (pattern="v.*")
      10:00.100  Policy committed to database                    Policy exists ✓
      10:00.150  Retroactive application starts
                 Namespace has 10,000 existing tags to process
                 
      10:00.500  Batch 1 (tags 1-5000) processed                Tags 1-5000: immutable=True ✓
      10:01.000  Batch 2 starts processing
      10:01.500  Database connection timeout error               ERROR!
                 Retroactive application FAILS
                 
      10:01.600  Exception handler executes
                 Attempts to delete policy for rollback
      10:01.700  Policy deleted from database                    Policy deleted ✓
                 
      FINAL STATE (INCONSISTENT):
      - Tags 1-5000: immutable=True (orphaned - no policy exists)
      - Tags 5001-10000: immutable=False (never processed)
      - Policy: DELETED
      - New tags pushed after: immutable=False (no policy to evaluate)
      

      Why This is a Problem

      1. Inconsistent State: Half the tags are immutable, half are not
      2. Orphaned Immutable Tags: Tags marked immutable but no policy exists
      3. No Recovery Mechanism: User cannot re-apply policy (duplicate pattern check fails)
      4. Unexpected Behavior: Some v.* tags are immutable, others are not
      5. Audit Confusion: Logs show policy was created then deleted

      Impact Assessment

      Data Integrity Impact:

      • Inconsistent immutability state across tag namespace
      • Orphaned immutable tags with no governing policy
      • Cannot determine which tags should be immutable
      • Policy recreation blocked by partial state

      Operational Impact:

      • Users confused about which tags are protected
      • Cannot cleanly rollback failed policy creation
      • Manual database intervention required to fix state
      • Requires admin to identify and fix orphaned tags

      Security Impact:

      • Some tags that should be immutable are mutable
      • Inconsistent protection across release tags
      • Compliance violations if immutability is required

      Likelihood:

      • LOW under normal conditions (retroactive usually succeeds)
      • MEDIUM for large namespaces (10k+ tags, higher timeout risk)
      • HIGH if database is under heavy load or network issues

      Affected Scenarios:
      1. Large namespace (100k+ tags) with database timeout during retroactive
      2. Network interruption during retroactive processing
      3. Database failover event during policy creation
      4. Worker process killed/restarted during retroactive
      5. Database connection pool exhaustion during batch processing

      Current Behavior vs Expected Behavior

      Current Behavior (Broken)

      CREATE POLICY → COMMIT POLICY → RETROACTIVE APPLY → FAIL → DELETE POLICY
      Result: Partial tags immutable, policy deleted, inconsistent state
      

      Expected Behavior (Atomic)

      CREATE POLICY → RETROACTIVE APPLY → SUCCESS → COMMIT POLICY
      Or:
      CREATE POLICY → RETROACTIVE APPLY → FAIL → ROLLBACK EVERYTHING
      Result: Either all tags immutable with policy, or no changes at all
      

      Root Cause Analysis

      Root Cause 1: Retroactive application outside transaction

      • Policy committed before retroactive completes
      • Cannot atomically rollback policy + tag updates
      • Two separate operations treated as single unit

      Root Cause 2: Incomplete rollback logic

      • Deletes policy but does not undo tag immutability changes
      • No tracking of which tags were updated
      • Cannot reverse partial batch operations

      Root Cause 3: No idempotency or recovery

      • Cannot safely retry policy creation (duplicate check fails)
      • No "resume from failure" mechanism
      • Manual intervention required to clean up

      Recommended Fix: Two-Phase Commit with Active Flag

      Add active field to policy tables to enable atomic activation:

      Database Schema Change

      class NamespaceImmutabilityPolicyTable(BaseTable):
          namespace = ForeignKeyField(User)
          policy = JSONField()
          uuid = UUIDField(default=uuid4)
          active = BooleanField(default=False)  # NEW FIELD
          created_date = DateTimeField(default=datetime.utcnow)
      
      class RepositoryImmutabilityPolicyTable(BaseTable):
          repository = ForeignKeyField(Repository)
          namespace = ForeignKeyField(User)
          policy = JSONField()
          uuid = UUIDField(default=uuid4)
          active = BooleanField(default=False)  # NEW FIELD
          created_date = DateTimeField(default=datetime.utcnow)
      

      Updated Policy Creation Logic

      def create_namespace_immutability_policy(orgname: str, policy_config: PolicyConfig):
          _validate_policy(policy_config)
          
          # Phase 1: Create INACTIVE policy (visible but not enforced)
          with db_transaction():
              namespace = get_active_namespace_user_by_username(orgname)
              
              if _is_duplicate_namespace_policy(namespace.id, policy_config):
                  raise DuplicateImmutabilityPolicy(...)
              
              policy = NamespaceImmutabilityPolicyTable.create(
                  namespace=namespace.id,
                  policy=policy_config,
                  active=False  # Not yet active - won't affect new tag creation
              )
              policy_uuid = policy.uuid
              namespace_id = namespace.id
          
          # Phase 2: Apply retroactively to existing tags (outside transaction)
          # New tags pushed during this phase are NOT affected (policy is inactive)
          try:
              apply_immutability_policy_to_existing_tags(
                  namespace_id=namespace_id,
                  repository_id=None,
                  tag_pattern=policy_config["tag_pattern"],
                  tag_pattern_matches=policy_config.get("tag_pattern_matches", True),
              )
          except Exception:
              logger.exception("Failed to retroactively apply policy")
              # Clean rollback: Delete inactive policy
              # No tags were affected by policy evaluation (it was inactive)
              try:
                  NamespaceImmutabilityPolicyTable.delete().where(
                      NamespaceImmutabilityPolicyTable.uuid == policy_uuid
                  ).execute()
              except Exception:
                  logger.exception("Failed to rollback inactive policy creation")
              raise
          
          # Phase 3: Activate policy (atomic commit)
          # From this point forward, new tags will check this policy
          with db_transaction():
              updated = NamespaceImmutabilityPolicyTable.update(active=True).where(
                  NamespaceImmutabilityPolicyTable.uuid == policy_uuid,
                  NamespaceImmutabilityPolicyTable.active == False  # Optimistic lock
              ).execute()
              
              if updated != 1:
                  raise DataModelException("Policy was already activated or deleted")
          
          logger.info(
              "Successfully created and activated immutability policy %s for namespace %s",
              policy_uuid, orgname
          )
          
          return policy
      

      Updated Policy Evaluation Logic

      def evaluate_immutability_policies(repository_id: int, namespace_id: int, tag_name: str) -> bool:
          # Check repository policies (ACTIVE ONLY)
          for row in RepositoryImmutabilityPolicyTable.select().where(
              RepositoryImmutabilityPolicyTable.repository == repository_id,
              RepositoryImmutabilityPolicyTable.active == True  # Only check active policies
          ):
              config = row.policy
              if _matches_policy(
                  tag_name, config.get("tag_pattern"), config.get("tag_pattern_matches", True)
              ):
                  return True
          
          # Check namespace policies (ACTIVE ONLY)
          for row in NamespaceImmutabilityPolicyTable.select().where(
              NamespaceImmutabilityPolicyTable.namespace == namespace_id,
              NamespaceImmutabilityPolicyTable.active == True  # Only check active policies
          ):
              config = row.policy
              if _matches_policy(
                  tag_name, config.get("tag_pattern"), config.get("tag_pattern_matches", True)
              ):
                  return True
          
          return False
      

      Benefits of This Approach

      1. Atomic Activation: Either all tags + policy active, or nothing changes
      2. Clean Rollback: Delete inactive policy, no orphaned immutable tags
      3. No Race Conditions: Inactive policies don't affect new tag creation
      4. Idempotent: Can retry failed policy creation safely
      5. No Long Transactions: Retroactive runs outside transaction
      6. Observable State: Can query inactive policies for debugging

      Database Migration

      # Alembic migration: add_immutability_policy_active_flag.py
      
      def upgrade():
          # Add active column with default True (existing policies are already applied)
          op.add_column('namespaceimmutabilitypolicy',
              sa.Column('active', sa.Boolean(), nullable=False, server_default='true'))
          op.add_column('repositoryimmutabilitypolicy',
              sa.Column('active', sa.Boolean(), nullable=False, server_default='true'))
          
          # All existing policies were successfully created, mark as active
          op.execute(
              "UPDATE namespaceimmutabilitypolicy SET active = true WHERE active IS NULL"
          )
          op.execute(
              "UPDATE repositoryimmutabilitypolicy SET active = true WHERE active IS NULL"
          )
          
          # Add index for performance (policy evaluation filters by active)
          op.create_index(
              'idx_namespace_immutability_policy_active',
              'namespaceimmutabilitypolicy',
              ['namespace_id', 'active']
          )
          op.create_index(
              'idx_repository_immutability_policy_active',
              'repositoryimmutabilitypolicy',
              ['repository_id', 'active']
          )
      
      def downgrade():
          op.drop_index('idx_repository_immutability_policy_active')
          op.drop_index('idx_namespace_immutability_policy_active')
          op.drop_column('repositoryimmutabilitypolicy', 'active')
          op.drop_column('namespaceimmutabilitypolicy', 'active')
      

      Testing Requirements

      Unit Tests

      1. Test policy creation with retroactive failure

      • Verify policy is deleted
      • Verify no tags are marked immutable
      • Verify clean rollback

      2. Test policy creation success

      • Verify policy is active
      • Verify all matching tags are immutable
      • Verify new tags are evaluated against policy

      3. Test inactive policy behavior

      • Create inactive policy
      • Push tag matching pattern
      • Verify tag is NOT immutable (policy inactive)
      • Activate policy
      • Push another tag
      • Verify new tag IS immutable

      4. Test policy evaluation performance

      • Verify active flag filter reduces query time
      • Test with 1000+ policies (most inactive)

      Integration Tests

      1. Failure scenarios:

      • Database timeout during retroactive
      • Network interruption during batch processing
      • Worker killed during retroactive
      • Database connection pool exhaustion

      2. Recovery scenarios:

      • Retry policy creation after failure
      • Verify duplicate check works with inactive policies
      • Clean up orphaned inactive policies

      3. Concurrency scenarios:

      • Create policy while pushing tags
      • Multiple concurrent policy creations
      • Policy creation during high traffic

      Regression Tests

      1. Existing immutability tests still pass
      2. Policy evaluation performance not degraded
      3. API responses unchanged
      4. UI shows active/inactive policy status

      Workarounds

      Until fixed, administrators can:

      Detection

      Find orphaned immutable tags (immutable but no policy):

      SELECT t.id, t.name, r.namespace_user_id, r.name as repo_name
      FROM tag t
      JOIN repository r ON t.repository_id = r.id
      WHERE t.immutable = true
        AND NOT EXISTS (
          SELECT 1 FROM repositoryimmutabilitypolicy rip
          WHERE rip.repository_id = r.id
        )
        AND NOT EXISTS (
          SELECT 1 FROM namespaceimmutabilitypolicy nip
          WHERE nip.namespace_id = r.namespace_user_id
        );
      

      Manual Cleanup

      Clear orphaned immutable tags:

      # Option 1: Remove immutability from orphaned tags
      curl -X PUT \
        -H "Authorization: Bearer ${TOKEN}" \
        -d '{"immutable": false}' \
        "https://quay.io/api/v1/repository/${ORG}/${REPO}/tag/${TAG}"
      
      # Option 2: Recreate the policy (if pattern is known)
      curl -X POST \
        -H "Authorization: Bearer ${TOKEN}" \
        -d '{"tagPattern": "^v.*", "tagPatternMatches": true}' \
        "https://quay.io/api/v1/organization/${ORG}/immutabilitypolicy/"
      

      Prevention

      1. Create policies during low-traffic maintenance windows
      2. Start with small namespaces to test
      3. Monitor database connection pool and timeout settings
      4. Increase database timeout for large namespaces
      5. Monitor Quay logs for policy creation failures

      Affected Code Files

      • data/model/immutability.py:269-311 - create_namespace_immutability_policy()
      • data/model/immutability.py:314-385 - update/delete namespace policy
      • data/model/immutability.py:409-470 - create_repository_immutability_policy()
      • data/model/immutability.py:473-555 - update/delete repository policy
      • data/model/immutability.py:561-592 - evaluate_immutability_policies()
      • data/database.py:2188-2198 - Policy table definitions
      • endpoints/api/immutability_policy.py - API endpoints

      Related Issues

      References

      • Code Review Date: 2026-03-04
      • Affected Versions: Quay 3.17+ (all versions with FEATURE_IMMUTABLE_TAGS)
      • Database: PostgreSQL
      • Estimated Fix Effort: 1-2 weeks (schema change + testing)

              rhn-support-bpratt Brady Pratt
              lzha1981 luffy zhang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: