Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-10631

Quay operator supports certmanager

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • quay-operator
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected

      Feature Overview (aka. goal summary)

      Enable the Quay Operator to automatically reference and watch TLS certificates managed by cert-manager, eliminating manual certificate copying and enabling seamless automatic certificate rotation.

      Currently, customers must manually copy certificate and key data from cert-manager Secrets into the Quay config Secret—a time-consuming, error-prone process that doesn't scale and creates operational risk.

      This feature requires the Quay Operator to be refactored to directly reference external TLS Secrets (e.g., from cert-manager) via the QuayRegistry CRD, establishing a Kubernetes-native pattern for external secret management:

      • By default, Quay continues to support embedded certificates in config.yaml (no breaking changes).
      • For automated certificate rotation, Quay will support referencing external Secrets managed by cert-manager, with the Operator watching for changes and automatically applying certificate updates.
      • This establishes architectural patterns that will extend to other external secret integrations (e.g., Vault via CSI, External Secrets Operator).

      Note: This feature addresses RFE-4287 and establishes the foundation for RFE-7974 (CSI Secrets Store Driver) and RFE-7835 (File-based Secrets).

      Goals (aka. expected user outcomes)

      The primary goal is to provide a secure, automated, and GitOps-friendly approach to TLS certificate management that eliminates manual operational overhead.

      This feature addresses compliance requirements and empowers users to:

      • Automated Certificate Rotation: Leverage cert-manager's automatic certificate renewal without manual intervention—certificates rotate seamlessly when cert-manager renews them.
      • Eliminate Manual Processes: Remove the error-prone manual copying of certificate data between Secrets, reducing operational risk and time spent on certificate management.
      • Kubernetes-Native Integration: Use standard Kubernetes patterns for Secret references (i.e., spec.tls.secretRef) rather than embedding certificate data directly in config files.
      • GitOps Friendly: Enable declarative, version-controlled configuration where TLS Secret references are clearly visible in the QuayRegistry CR rather than buried in opaque config.yaml blobs.
      • Maintain Backward Compatibility: Continue supporting the existing embedded certificate approach (SSL_CERT/SSL_KEY in config.yaml) for users who prefer it or use standalone deployments.
      • Establish Scalable Pattern: Create architectural patterns for external secret management that extend beyond TLS to support database credentials, storage secrets, etc. (future work).

      Background

      Customer pain points: Manual Certificate Management

      Customers deploying Quay on OpenShift/Kubernetes frequently use cert-manager to automate certificate issuance and renewal.  However, integrating cert-manager with Quay currently requires manual intervention:

      Current Manual Workflow
      1. cert-manager issues/renews certificate, updates Secret (quay-tls-cert)
      2. Administrator manually copies tls.crt and tls.key from cert-manager Secret
      3. Administrator manually updates Quay config Secret with new certificate data
      4. Administrator manually restarts Quay pods to apply the certificate

      This workflow is:

      • Time-consuming and manual (doesn't scale across multiple Quay instances)
      • Error-prone (copy-paste mistakes, formatting issues, missed renewals)
      • Operationally risky (certificate expiry can cause registry outages)
      • Not GitOps-friendly (requires imperative kubectl commands)

      Customer feedback highlights this as a critical operational burden, especially in environments with multiple Quay registries and frequent certificate rotation policies.

      Broader Context: External Secrets Management Gap

      This feature aims to address a larger architectural challenge with how the Quay Operator handles external secrets:

      RFE Problem Customer Impact
      RFE-4287 No cert-manager integration Manual certificate rotation is required
      RFE-7974 Operator overwrites CSI-mounted secrets Cannot use Vault/external secret managers
      RFE-7835 PostgreSQL secrets as environment variables CIS Kubernetes Benchmark non-compliance

      Root cause: The Quay Operator currently assumes it owns and controls all secrets.  It lacks mechanisms to:

      1. Reference secrets managed by external systems (cert-manager, Vault, CSI)
      2. Watch external secrets without mutating them
      3. Distinguish between operator-managed vs externally-managed secrets

      What we want: Implement a unified external secrets architecture starting with cert-manager TLS certificates (this feature), designed to extend to other secret sources in future phases.

      Why CRD-based Approach?

      The solution requires adding secret references to the QuayRegistry CRD rather than only modifying config.yaml.  This is because:

      CRD vs config.yaml Comparison

      Option 1: config.yaml Only (Not feasible –> needs pivot)

      • Secret-within-Secret anti-pattern (configBundleSecret references another Secret)
      • Operator must parse YAML to extract Secret name, then watch it
      • Not Kubernetes-native (Ingress, Gateway API all use CRD references)
      • Doesn't align with RFE-7974 (CSI) or RFE-7835 (file-based secrets)
      • Cannot support cross-namespace Secret access cleanly

      Option 2: QuayRegistry CRD (–> Proposed)

      • Standard Kubernetes pattern (like Ingress spec.tls.secretName)
      • Operator can directly watch the referenced Secret
      • Declarative and GitOps-friendly
      • Establishes a pattern that extends to CSI (RFE-7974) and file-based secrets (RFE-7835)
      • Supports cross-namespace references with clear RBAC
      • Can be validated at admission time (future: validating webhook)

      Note on Deployment Models:

      • Operator-based deployments (OpenShift/K8s): Use the CRD approach for external secret references
      • Standalone deployments (RHEL/VM): Continue using config.yaml (no Kubes, no CRDs available)
      • config.yaml cannot be deprecated as it's required for standalone deployments

      Requirements (aka. acceptance criteria)

      • The QuayRegistry CRD is extended with spec.tls.secretRef field to reference external TLS Secrets.
      • The Operator supports standard cert-manager Secret format (kubernetes.io/tls type with tls.crt and tls.key keys).
      • The Operator watches referenced external Secrets and detects certificate updates without mutating the external Secret.
      • The Operator validates certificates before applying them (format, key-certificate match, expiry, chain validation).
      • Upon detecting a valid certificate update, the Operator triggers automatic rolling restart of Quay pods to apply the new certificate with minimal downtime.
      • The Operator updates QuayRegistry status with TLS certificate information (source, expiry date, last rotation timestamp, validation status).
      • Backward compatibility is maintained: Existing deployments using embedded SSL_CERT/SSL_KEY in config.yaml continue to work without changes.
      • If both spec.tls.secretRef and embedded SSL_CERT/SSL_KEY are present, the Operator handles configuration conflicts with clear error messages.
      • The Operator supports cross-namespace Secret references (e.g., cert-manager managing secrets in a different namespace).
      • The Operator's ClusterRole is updated with permissions to read Secrets in referenced namespaces.
      • The implementation establishes patterns that will extend to RFE-7974 (CSI Secrets Store Driver) and RFE-7835 (file-based secrets) without architectural rework.
      • Standalone deployments (RHEL/VM) are not affected—they continue using config.yaml as the sole configuration method.

      Use Cases

      • Scenario 1: Automated Certificate Rotation with Let's Encrypt
        • As a: Cluster Administrator,
        • I want to: configure Quay to use cert-manager with Let's Encrypt as the certificate issuer.
        • So that: Certificates automatically renew every 90 days without manual intervention, and Quay automatically applies renewed certificates without requiring administrator action or registry downtime.
      • Scenario 2: Enterprise CA with GitOps Workflow
        • As a: Platform Engineer,
        • I want to: manage Quay's TLS certificates through our GitOps workflow using cert-manager and our internal enterprise CA.
        • So that: Certificate configuration is version-controlled, auditable, and follows our standard Kubernetes patterns for TLS management across all platform services.
      • Scenario 3: Multi-Namespace Deployment with Centralized Cert Management
        • As a: Security Operations Engineer,
        • I want to: manage all TLS certificates from a centralized cert-manager namespace.
        • So that: I can enforce certificate policies and monitoring from a single location, while Quay instances in different namespaces reference their certificates without duplicating cert-manager deployments.
      • Scenario 4: Migration from Manual to Automated Certificate Management
        • As a: Quay Administrator,
        • I want to: migrate from manually managing embedded certificates to cert-manager integration.
        • So that: I can reduce operational overhead and eliminate the risk of certificate expiry causing outages, while maintaining the ability to rollback to manual management if needed.
      • Scenario 5: Standalone Deployment with Manual Certificates
        • As a: System Administrator,
        • I want to: run Quay on a RHEL VM without Kubernetes using certbot for certificate renewal.
        • So that: I can manage certificates using standard Linux tools and cron jobs, with Quay reading certificates from config.yaml as it does today (no breaking changes).

      Out of Scope

      • Automatic hot reload without pod restart: Initial implementation triggers rolling restart. Hot reload (e.g., Nginx SIGHUP) is a potential Phase 2 enhancement.
      • Multi-certificate support (SNI): Initial implementation supports single certificate per Quay instance.  Server Name Indication (SNI) with multiple certificates is future work.
      • Certificate provisioning: This feature assumes cert-manager is already installed and configured.  Quay Operator does not install or manage cert-manager.
      • Admission webhook validation: Initial implementation does runtime validation.  Validating webhook to catch errors at CR submission time is a future enhancement.
      • Database and storage credentials: This feature focuses on TLS certificates.  Database and storage secret management from external sources (Vault, CSI) is covered by RFE-7974 (future work building on this architecture).
      • Standalone deployment enhancements: Adding SSL_CERT_FILE/SSL_KEY_FILE config options for file-based certificates in standalone mode is potential future work.

      Documentation Considerations

      • Documentation explains the two configuration approaches:
        • Embedded certificates in config.yaml (current, continues to work)
        • External cert-manager Secret reference via CRD (new, recommended for K8s)
      • Provide a complete setup guide for cert-manager integration:
        • Prerequisites (cert-manager installation)
        • Creating Certificate resource
        • Configuring QuayRegistry to reference cert-manager Secret
        • Verification steps
      • Provide a migration guide from embedded to cert-manager certificates:
        • Assessment steps
        • Migration procedure
        • Validation
        • Rollback procedure
      • Document cross-namespace Secret references with RBAC configuration examples.
      • Include a troubleshooting section covering common issues:
        • Referenced Secret not found
        • Certificate validation failures
        • RBAC permission errors
        • Certificate-hostname mismatch
      • Provide complete YAML examples showing:
        • cert-manager Certificate resource
        • QuayRegistry CR with spec.tls.secretRef
        • RBAC for cross-namespace access
      • Include architecture diagrams showing:
        • cert-manager → Secret → QuayRegistry CRD → Operator → Quay Pods flow
        • Certificate rotation trigger flow
        • Deployment model comparison (operator vs standalone)
      • Clearly document deployment model differences:
        • Operator-based (OpenShift/K8s): Can use cert-manager integration
        • Standalone (RHEL/VM): Uses config.yaml, alternative approaches (certbot, manual)

              Unassigned Unassigned
              rhn-coreos-tunwu Tony Wu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: