Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-702

The caBundle field of alertmanagerconfigs.monitoring.coreos.com crd is getting removed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • 4.11
    • None
    • None
    • 3
    • OTA 229, OTA 230, OTA 231
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None

      I am using OCP 4.11.0:

      $ oc version
      Client Version: 4.10.25
      Server Version: 4.11.0
      Kubernetes Version: v1.24.0+9546431
      

      I added my private CA certificate to the CA bundle as per documentation: Updating the CA bundle

      After that I can see an intermittent error:

      $ oc get kubeapiservers.operator.openshift.io cluster -o yaml
      …
                  lastTransitionTime: "2022-08-27T20:32:39Z"
                  message: "alertmanagerconfigs.monitoring.coreos.com: x509: certificate signed by unknown authority"
                  reason: WebhookServiceConnectionError
                  status: "True"
                  type: CRDConversionWebhookConfigurationError
      …
      

      In the Kubernetes audit logs, I can see that two controllers (cluster-version-operator and service-ca) are overwriting each other's changes to the alertmanagerconfigs.monitoring.coreos.com crd:

      $ oc get crd alertmanagerconfigs.monitoring.coreos.com
      apiVersion: apiextensions.k8s.io/v1
      kind: CustomResourceDefinition
      metadata:
        …
        name: alertmanagerconfigs.monitoring.coreos.com
        …
      spec:
        conversion:
          strategy: Webhook
          webhook:
            clientConfig:
              caBundle: LS0tLS1CRUdJTi …
              service:
      …
      
      

      The service-ca controller adds the caBundle field to the crd resource. The cluster-version-operator removes it. This continues periodically.

      I reviewed the crd definition from inside of the cluster-version-operator container:

      $ oc rsh -n openshift-cluster-version cluster-version-operator-796d5bc86b-52qjw
      $ cat /release-manifests/0000_50_cluster-monitoring-operator_00_0alertmanager-config-custom-resource-definition.yaml
      apiVersion: apiextensions.k8s.io/v1
      kind: CustomResourceDefinition
      metadata:
        annotations:
          controller-gen.kubebuilder.io/version: v0.8.0
          include.release.openshift.io/ibm-cloud-managed: "true"
          include.release.openshift.io/self-managed-high-availability: "true"
          include.release.openshift.io/single-node-developer: "true"
          service.beta.openshift.io/inject-cabundle: "true"
        creationTimestamp: null
        name: alertmanagerconfigs.monitoring.coreos.com
      spec:
        conversion:
          strategy: Webhook
          webhook:
            clientConfig:
              service:
                name: prometheus-operator-admission-webhook
                namespace: openshift-monitoring
                path: /convert
                port: 8443
            conversionReviewVersions:
            - v1beta1
            - v1alpha1
        group: monitoring.coreos.com
        names:
      …
      

      The definition above includes the webhook configuration fields. This is probably the reason why the cluster-version-operator overwrites the changes made by the service-ca controller.

      Note that I filed a similar bug report here: https://issues.redhat.com/browse/PSAP-889

              dhurta@redhat.com David Hurta
              anosek@redhat.com Ales Nosek
              Evgeni Vakhonin Evgeni Vakhonin
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: