Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-46434

IBM Fusion operator upgrade is blocked with the error: "error validating existing CRs against new CRD's schema"

    • Critical
    • None
    • Charmander OLM Sprint 263, Diglett OLM Sprint 264
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      We started seeing some issues with folks who had spurious CRD incompatibility claims when updating operators. It is a failure in OLM code which validates existing CRs against incoming CRDs, recently updated in https://github.com/operator-framework/operator-lifecycle-manager/pull/3387.

      This manifested in `InstallPlan` `.status.Message` something like:

      {code:none}
      retrying execution due to error: error validating existing CRs against new CRD's schema for \"pgadmins.postgres-operator.crunchydata.com\": error validating postgres-operator.crunchydata.com/v1beta1, Kind=PGAdmin \"openshift-operators/example-pgadmin\": updated validation is too restrictive: [].spec.tolerations[0].tolerationSeconds: Invalid value: \"number\": spec.tolerations[0].tolerationSeconds in body must be of type integer: \"number\"
      {code:none}

      The difference between the predecessor calling convention and the one introduced in #3387 appears to be that one is a pointer and the other is concrete.

      old
      {code:none}
      unstructured.Unstructured{Object:map[string]interface...
      {code:none}

      new
      {code:none}
      &unstructured.Unstructured{Object:map[string]interface...
      {code:none}

      so it would seem that merely type-asserting the value and de-referencing it would yield the appropriate result, but it appears instead that it effectively disables all CR vs CRD reconciliation checks (evidenced by the fact that the unit tests multiply fail).

      But k8s already dereferences pointer parameters [here|https://github.com/kubernetes/kube-openapi/blob/master/pkg/validation/validate/schema.go#L139-L141] during validation. So that isn't it.

      And the `validate.ValidateCustomResource` interface is terrifyingly permissive in allowing `customResource` as `interface{}` [here|https://pkg.go.dev/k8s.io/apiextensions-apiserver@v0.31.3/pkg/apiserver/validation#ValidateCustomResource]. So we cannot derive guidance from it.

      Taking a page from k8s' use of the validation API, which uses `unstructured.UnstructuredContent()` to convert the `unstructured.Unstructured` into a `map[string]interface{}` [here|https://github.com/kubernetes/kubernetes/blob/1504f10e7946f95a8b1da35e28e4c7453ff62775/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L54] then we achieve the desired results.
      Show
      We started seeing some issues with folks who had spurious CRD incompatibility claims when updating operators. It is a failure in OLM code which validates existing CRs against incoming CRDs, recently updated in https://github.com/operator-framework/operator-lifecycle-manager/pull/3387 . This manifested in `InstallPlan` `.status.Message` something like: {code:none} retrying execution due to error: error validating existing CRs against new CRD's schema for \"pgadmins.postgres-operator.crunchydata.com\": error validating postgres-operator.crunchydata.com/v1beta1, Kind=PGAdmin \"openshift-operators/example-pgadmin\": updated validation is too restrictive: [].spec.tolerations[0].tolerationSeconds: Invalid value: \"number\": spec.tolerations[0].tolerationSeconds in body must be of type integer: \"number\" {code:none} The difference between the predecessor calling convention and the one introduced in #3387 appears to be that one is a pointer and the other is concrete. old {code:none} unstructured.Unstructured{Object:map[string]interface... {code:none} new {code:none} &unstructured.Unstructured{Object:map[string]interface... {code:none} so it would seem that merely type-asserting the value and de-referencing it would yield the appropriate result, but it appears instead that it effectively disables all CR vs CRD reconciliation checks (evidenced by the fact that the unit tests multiply fail). But k8s already dereferences pointer parameters [here| https://github.com/kubernetes/kube-openapi/blob/master/pkg/validation/validate/schema.go#L139-L141 ] during validation. So that isn't it. And the `validate.ValidateCustomResource` interface is terrifyingly permissive in allowing `customResource` as `interface{}` [here| https://pkg.go.dev/k8s.io/apiextensions-apiserver@v0.31.3/pkg/apiserver/validation#ValidateCustomResource ]. So we cannot derive guidance from it. Taking a page from k8s' use of the validation API, which uses `unstructured.UnstructuredContent()` to convert the `unstructured.Unstructured` into a `map[string]interface{}` [here| https://github.com/kubernetes/kubernetes/blob/1504f10e7946f95a8b1da35e28e4c7453ff62775/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L54 ] then we achieve the desired results.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-46054. The following is the description of the original issue:
      โ€”
      This is a clone of issue OCPBUGS-46018. The following is the description of the original issue:
      โ€”
      This is a clone of issue OCPBUGS-42815. The following is the description of the original issue:
      โ€”
      Description of problem:

          While upgrading the Fusion operator,  IBM team is facing the following error in the operator's subscription:
      error validating existing CRs against new CRD's schema for "fusionserviceinstances.service.isf.ibm.com": error validating service.isf.ibm.com/v1, Kind=FusionServiceInstance "ibm-spectrum-fusion-ns/odfmanager": updated validation is too restrictive: [].status.triggerCatSrcCreateStartTime: Invalid value: "number": status.triggerCatSrcCreateStartTime in body must be of type integer: "number"
      
      
      question here, "triggerCatSrcCreateStartTime" has been present in the operator for the past few releases and it's datatype (integer) hasn't changed in the latest release as well. There was  one "FusionServiceInstance" CR present in the cluster when this issue was hit and the value of "triggerCatSrcCreateStartTime" field being "1726856593000774400".

      Version-Release number of selected component (if applicable):

          Its impacting between OCP 4.16.7 and OCP 4.16.14 versions

      How reproducible:

          Always

      Steps to Reproduce:

          1.Upgrade the fusion operator ocp version 4.16.7 to ocp 4.16.14
          2.
          3.
          

      Actual results:

          Upgrade fails with error in description

      Expected results:

          Upgrade should not be failed 

      Additional info:

          

            [OCPBUGS-46434] IBM Fusion operator upgrade is blocked with the error: "error validating existing CRs against new CRD's schema"

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (OpenShift Container Platform 4.16.28 bug fix update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHBA-2024:11502

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (OpenShift Container Platform 4.16.28 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:11502

            Xia Zhao added a comment -

            verify

            xzha@xzha1-mac openshift-tests-private % oc exec olm-operator-8b5cc4697-mz97t  -- olm --version
            OLM version: 0.0.0-c8eeb315a5c13b18aa378868a0e7093109855033
            git commit: c8eeb315a5c13b18aa378868a0e7093109855033
            xzha@xzha1-mac openshift-tests-private % oc get clusterversion
            NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
            version   4.16.0-0.test-2024-12-16-094055-ci-ln-556cf9k-latest   True        False         9m10s   Cluster version is 4.16.0-0.test-2024-12-16-094055-ci-ln-556cf9k-latest
            
            
            1, create sub
            xzha@xzha1-mac OCPBUGS-42815 % cat sub.yaml 
            apiVersion: operators.coreos.com/v1alpha1
            kind: Subscription
            metadata:
              name: postgresql
              namespace: test-3
            spec:
              channel: v5
              installPlanApproval: Automatic
              name: postgresql
              source: community-operators
              sourceNamespace: openshift-marketplace
              startingCSV: postgresoperator.v5.7.0
            
            xzha@xzha1-mac OCPBUGS-42815 % oc get csv
            NAME                      DISPLAY                           VERSION   REPLACES                  PHASE
            postgresoperator.v5.7.0   Crunchy Postgres for Kubernetes   5.7.0     postgresoperator.v5.6.1   Succeeded
            
            2, create cr
            xzha@xzha1-mac OCPBUGS-42815 % oc get csv
            NAME                      DISPLAY                           VERSION   REPLACES                  PHASE
            postgresoperator.v5.7.0   Crunchy Postgres for Kubernetes   5.7.0     postgresoperator.v5.6.1   Succeeded
            
            3, delete sub/csv
            xzha@xzha1-mac OCPBUGS-42815 % oc delete sub postgresql
            subscription.operators.coreos.com "postgresql" deleted
            xzha@xzha1-mac OCPBUGS-42815 % oc delete csv postgresoperator.v5.7.0
            clusterserviceversion.operators.coreos.com "postgresoperator.v5.7.0" deleted
            
            4, re-create sub
            xzha@xzha1-mac OCPBUGS-42815 % oc apply -f sub.yaml 
            subscription.operators.coreos.com/postgresql created 
            
            xzha@xzha1-mac OCPBUGS-42815 % oc get csv 
            NAME                      DISPLAY                           VERSION   REPLACES                  PHASE
            postgresoperator.v5.7.0   Crunchy Postgres for Kubernetes   5.7.0     postgresoperator.v5.6.1   Succeeded

            LGTM, verified

            Xia Zhao added a comment - verify xzha@xzha1-mac openshift-tests- private % oc exec olm- operator -8b5cc4697-mz97t  -- olm --version OLM version: 0.0.0-c8eeb315a5c13b18aa378868a0e7093109855033 git commit: c8eeb315a5c13b18aa378868a0e7093109855033 xzha@xzha1-mac openshift-tests- private % oc get clusterversion NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS version   4.16.0-0.test-2024-12-16-094055-ci-ln-556cf9k-latest   True        False         9m10s   Cluster version is 4.16.0-0.test-2024-12-16-094055-ci-ln-556cf9k-latest 1, create sub xzha@xzha1-mac OCPBUGS-42815 % cat sub.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata:   name: postgresql   namespace: test-3 spec:   channel: v5   installPlanApproval: Automatic   name: postgresql   source: community-operators   sourceNamespace: openshift-marketplace   startingCSV: postgresoperator.v5.7.0 xzha@xzha1-mac OCPBUGS-42815 % oc get csv NAME                      DISPLAY                           VERSION   REPLACES                  PHASE postgresoperator.v5.7.0   Crunchy Postgres for Kubernetes   5.7.0     postgresoperator.v5.6.1   Succeeded 2, create cr xzha@xzha1-mac OCPBUGS-42815 % oc get csv NAME                      DISPLAY                           VERSION   REPLACES                  PHASE postgresoperator.v5.7.0   Crunchy Postgres for Kubernetes   5.7.0     postgresoperator.v5.6.1   Succeeded 3, delete sub/csv xzha@xzha1-mac OCPBUGS-42815 % oc delete sub postgresql subscription.operators.coreos.com "postgresql" deleted xzha@xzha1-mac OCPBUGS-42815 % oc delete csv postgresoperator.v5.7.0 clusterserviceversion.operators.coreos.com "postgresoperator.v5.7.0" deleted 4, re-create sub xzha@xzha1-mac OCPBUGS-42815 % oc apply -f sub.yaml subscription.operators.coreos.com/postgresql created xzha@xzha1-mac OCPBUGS-42815 % oc get csv  NAME                      DISPLAY                           VERSION   REPLACES                  PHASE postgresoperator.v5.7.0   Crunchy Postgres for Kubernetes   5.7.0     postgresoperator.v5.6.1   Succeeded LGTM, verified

              rh-ee-jkeister Jordan Keister
              openshift-crt-jira-prow OpenShift Prow Bot
              Xia Zhao Xia Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: