Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-46434

IBM Fusion operator upgrade is blocked with the error: "error validating existing CRs against new CRD's schema"

XMLWordPrintable

    • Critical
    • None
    • Charmander OLM Sprint 263, Diglett OLM Sprint 264
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      We started seeing some issues with folks who had spurious CRD incompatibility claims when updating operators. It is a failure in OLM code which validates existing CRs against incoming CRDs, recently updated in https://github.com/operator-framework/operator-lifecycle-manager/pull/3387.

      This manifested in `InstallPlan` `.status.Message` something like:

      {code:none}
      retrying execution due to error: error validating existing CRs against new CRD's schema for \"pgadmins.postgres-operator.crunchydata.com\": error validating postgres-operator.crunchydata.com/v1beta1, Kind=PGAdmin \"openshift-operators/example-pgadmin\": updated validation is too restrictive: [].spec.tolerations[0].tolerationSeconds: Invalid value: \"number\": spec.tolerations[0].tolerationSeconds in body must be of type integer: \"number\"
      {code:none}

      The difference between the predecessor calling convention and the one introduced in #3387 appears to be that one is a pointer and the other is concrete.

      old
      {code:none}
      unstructured.Unstructured{Object:map[string]interface...
      {code:none}

      new
      {code:none}
      &unstructured.Unstructured{Object:map[string]interface...
      {code:none}

      so it would seem that merely type-asserting the value and de-referencing it would yield the appropriate result, but it appears instead that it effectively disables all CR vs CRD reconciliation checks (evidenced by the fact that the unit tests multiply fail).

      But k8s already dereferences pointer parameters [here|https://github.com/kubernetes/kube-openapi/blob/master/pkg/validation/validate/schema.go#L139-L141] during validation. So that isn't it.

      And the `validate.ValidateCustomResource` interface is terrifyingly permissive in allowing `customResource` as `interface{}` [here|https://pkg.go.dev/k8s.io/apiextensions-apiserver@v0.31.3/pkg/apiserver/validation#ValidateCustomResource]. So we cannot derive guidance from it.

      Taking a page from k8s' use of the validation API, which uses `unstructured.UnstructuredContent()` to convert the `unstructured.Unstructured` into a `map[string]interface{}` [here|https://github.com/kubernetes/kubernetes/blob/1504f10e7946f95a8b1da35e28e4c7453ff62775/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L54] then we achieve the desired results.
      Show
      We started seeing some issues with folks who had spurious CRD incompatibility claims when updating operators. It is a failure in OLM code which validates existing CRs against incoming CRDs, recently updated in https://github.com/operator-framework/operator-lifecycle-manager/pull/3387 . This manifested in `InstallPlan` `.status.Message` something like: {code:none} retrying execution due to error: error validating existing CRs against new CRD's schema for \"pgadmins.postgres-operator.crunchydata.com\": error validating postgres-operator.crunchydata.com/v1beta1, Kind=PGAdmin \"openshift-operators/example-pgadmin\": updated validation is too restrictive: [].spec.tolerations[0].tolerationSeconds: Invalid value: \"number\": spec.tolerations[0].tolerationSeconds in body must be of type integer: \"number\" {code:none} The difference between the predecessor calling convention and the one introduced in #3387 appears to be that one is a pointer and the other is concrete. old {code:none} unstructured.Unstructured{Object:map[string]interface... {code:none} new {code:none} &unstructured.Unstructured{Object:map[string]interface... {code:none} so it would seem that merely type-asserting the value and de-referencing it would yield the appropriate result, but it appears instead that it effectively disables all CR vs CRD reconciliation checks (evidenced by the fact that the unit tests multiply fail). But k8s already dereferences pointer parameters [here| https://github.com/kubernetes/kube-openapi/blob/master/pkg/validation/validate/schema.go#L139-L141 ] during validation. So that isn't it. And the `validate.ValidateCustomResource` interface is terrifyingly permissive in allowing `customResource` as `interface{}` [here| https://pkg.go.dev/k8s.io/apiextensions-apiserver@v0.31.3/pkg/apiserver/validation#ValidateCustomResource ]. So we cannot derive guidance from it. Taking a page from k8s' use of the validation API, which uses `unstructured.UnstructuredContent()` to convert the `unstructured.Unstructured` into a `map[string]interface{}` [here| https://github.com/kubernetes/kubernetes/blob/1504f10e7946f95a8b1da35e28e4c7453ff62775/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L54 ] then we achieve the desired results.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-46054. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-46018. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-42815. The following is the description of the original issue:

      Description of problem:

          While upgrading the Fusion operator,  IBM team is facing the following error in the operator's subscription:
      error validating existing CRs against new CRD's schema for "fusionserviceinstances.service.isf.ibm.com": error validating service.isf.ibm.com/v1, Kind=FusionServiceInstance "ibm-spectrum-fusion-ns/odfmanager": updated validation is too restrictive: [].status.triggerCatSrcCreateStartTime: Invalid value: "number": status.triggerCatSrcCreateStartTime in body must be of type integer: "number"
      
      
      question here, "triggerCatSrcCreateStartTime" has been present in the operator for the past few releases and it's datatype (integer) hasn't changed in the latest release as well. There was  one "FusionServiceInstance" CR present in the cluster when this issue was hit and the value of "triggerCatSrcCreateStartTime" field being "1726856593000774400".

      Version-Release number of selected component (if applicable):

          Its impacting between OCP 4.16.7 and OCP 4.16.14 versions

      How reproducible:

          Always

      Steps to Reproduce:

          1.Upgrade the fusion operator ocp version 4.16.7 to ocp 4.16.14
          2.
          3.
          

      Actual results:

          Upgrade fails with error in description

      Expected results:

          Upgrade should not be failed 

      Additional info:

          

              rh-ee-jkeister Jordan Keister
              openshift-crt-jira-prow OpenShift Prow Bot
              Xia Zhao Xia Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: