Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-10928

Add self-healing for CA bundle configmaps in user namespaces

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      As a cluster admin, I want the operator to automatically detect and recreate missing CA bundle configmaps in user namespaces, so that workloads continue to function correctly even if configmaps are accidentally deleted.

      Acceptance Criteria

      • Test that operator detects missing config-trusted-cabundle configmap even when namespace label indicates reconciliation is complete
      • Test that operator detects missing config-service-cabundle configmap even when namespace label indicates reconciliation is complete
      • Verify that operator recreates both configmaps when either is missing
      • Verify that operator logs a warning message when missing configmaps are detected
      • Test that reconciliation behavior matches existing RBAC self-healing (checks RoleBinding existence)
      • Verify that namespace label remains accurate after configmap recreation
      • Test that self-healing works after upgrade, manual deletion, and other scenarios

      Problem Context

      Currently, the operator uses label operator.tekton.dev/namespace-trusted-ca-config: "X.XX.X" to track which namespaces have CA bundles configured. Once this label matches the current operator version, the operator skips CA bundle reconciliation for that namespace permanently.

      If the configmaps are subsequently deleted (manually, or by external processes), the operator never recreates them because it only checks the label, not the actual existence of the configmaps.

      This differs from RBAC reconciliation, which includes self-healing checks:

      // RBAC has self-healing (rbac.go:312-320)
      if ns.Labels[namespaceVersionLabel] == r.version {
          // Even if label matches, verify RoleBinding exists
          _, err := r.kubeClientSet.RbacV1().RoleBindings(ns.Name).Get(...)
          if errors.IsNotFound(err) {
              needsRBAC = true  // Re-reconcile if missing!
          }
      }
      

      CA bundles lack this verification (rbac.go:332-336):

      // NO self-healing check
      if ns.Labels[namespaceTrustedConfigLabel] != r.version {
          result.CANamespaces = append(result.CANamespaces, ns)
      }
      

      Proposed Implementation

      Add self-healing check similar to RBAC in getNamespacesToBeReconciled():

      needsCABundle := false
      if ns.Labels[namespaceTrustedConfigLabel] != r.version {
          needsCABundle = true
      } else {
          // Self-healing: Verify configmaps exist even when label matches
          _, err1 := r.kubeClientSet.CoreV1().ConfigMaps(ns.Name).Get(ctx, trustedCAConfigMapName, metav1.GetOptions{})
          _, err2 := r.kubeClientSet.CoreV1().ConfigMaps(ns.Name).Get(ctx, serviceCAConfigMapName, metav1.GetOptions{})
          if errors.IsNotFound(err1) || errors.IsNotFound(err2) {
              logger.Infof("CA bundle configmaps missing in namespace %s, will reconcile", ns.Name)
              needsCABundle = true
          } else if err1 != nil || err2 != nil {
              return nil, fmt.Errorf("error checking configmaps in namespace %s: %w", ns.Name, err)
          }
      }
      
      if needsCABundle {
          logger.Debugf("Adding namespace for CA bundle reconciliation: %s", ns.GetName())
          result.CANamespaces = append(result.CANamespaces, ns)
      }
      

      Customer Impact

      Customer reported missing configmaps after upgrade from 1.19 to 1.20. Must-gather analysis shows:

      • Operator logs "No namespaces need reconciliation" 4,687 times over 42.5 hours
      • ZERO namespace processing activity in 65,778 log lines
      • All namespaces have label set but configmaps missing
      • No errors or warnings indicating the problem

      Current workaround: Remove namespace label to force reconciliation:

      oc label namespace [namespace-name] operator.tekton.dev/namespace-trusted-ca-config-
      

      Files to Modify

      • pkg/reconciler/openshift/tektonconfig/rbac.go - Add self-healing in getNamespacesToBeReconciled() method (lines 332-336)

              rh-ee-abghosh Abhishek Ghosh
              jkhelil abdeljawed khelil
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: