Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63208

OLM operator upgrade failing in 4.14, after API version removal

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.14
    • OLM
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When upgrading netobserv operator from current (1.9.3) to the next release candidate (1.10.0), the upgrade fails with an error message such as:
      
      error validating existing CRs against new CRD's schema for "flowcollectors.flows.netobserv.io": error validating flows.netobserv.io/v1beta1, Kind=FlowCollector "cluster": updated validation is too restrictive: [].spec.loki.mode: Required value
      
      This only happen on 4.14 or earlier cluster. Tested on 4.15, the upgrade works fine.
      This has been discussed on slack: https://redhat-internal.slack.com/archives/C3VS0LV41/p1760598830037629
      
      This error is not expected, because no flowcollector CR should exist at this point in version v1beta1: it was already deprecated, removed from storage, etc. The new release of the operator (1.10, which we're trying to install) is removing v1beta1.
      
      It seems like this problem was fixed in 4.15 through this commit: https://github.com/operator-framework/operator-lifecycle-manager/commit/242f63fe21d23f51c13f31cf6ca184a97f6ee28b#diff-a1760d9b7ac1e93734eea675d8d8938c96a50e995434b163c6f77c91bace9990L2093-L2105
      
      As we can see, the old algorithm was validating the CR against all versions of the *OLD* crd, which corresponds exactly to what we are seeing. The commit changes that to validate the CR against versions of the NEW crd, which indeed makes more sense.

      Version-Release number of selected component (if applicable):

      openshift 4.14.z
      netobserv 1.9.3

      How reproducible:

      Always

      Steps to Reproduce:

      Note: at the time of writing, netobserv latest version is 1.9.3; 1.10 is currently a release candidate, which we need to go through extra steps in order to install
      
          1. On an ocp 4.14 cluster, install Network Observability operator 1.9.3
          2. Create a FlowCollector resource (e.g. with all default values)
          3. To upgrade to the release candidate 1.10, apply the attached resources ImageDigestMirrorSet and CatalogSource, then run:
      
      oc -n openshift-netobserv-operator patch subscription netobserv-operator --type='json' -p "[{'op': 'replace', 'path': '/spec/source', 'value': 'netobserv-konflux'}]"

      image-digest-mirror-set.txt catalog-source.txt

      Actual results:

      After a minute or two, the installation failed with the error message mentioned above (or similar)

      Expected results:

      Successful upgrade

      Additional info:

      Like I said, I think I've identified the commit that fixes it in 4.15 : https://github.com/operator-framework/operator-lifecycle-manager/commit/242f63fe21d23f51c13f31cf6ca184a97f6ee28b#diff-a1760d9b7ac1e93734eea675d8d8938c96a50e995434b163c6f77c91bace9990L2093-L2105
      
      A possible workaround is to ask customers to turn off the CRD "served" flag prior to deletion. Which sort of makes sense by the way, but more recent versions of OLM aren't so strict to require that and they don't fail the upgrade for that reason.

              rh-ee-cchantse Catherine Chan-Tse
              jtakvori Joel Takvorian
              None
              None
              Jian Zhang Jian Zhang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: