Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-39539

CVO wedges while reconciling a CRD with rogue owner references

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Reported via Slack, there was a customer whose update from 4.15.21 to 4.16.8 wedged with Failing=True like this:

      Could not update customresourcedefinition "clusterserviceversions.operators.coreos.com" (649 of 903): the object is invalid, possibly due to local cluster configuration
      

      CVO log contained more information:

      2024-09-03T14:11:19.672587865Z I0903 14:11:19.672572       1 sync_worker.go:1171] Update error 649 of 903: UpdatePayloadResourceInvalid Could not update customresourcedefinition "clusterserviceversions.operators.coreos.com" (649 of 903): the object is invalid, possibly due to local cluster configuration (*errors.StatusError: CustomResourceDefinition.apiextensions.k8s.io "clusterserviceversions.operators.coreos.com" is invalid: metadata.ownerReferences: Invalid value: []v1.OwnerReference{v1.OwnerReference{APIVersion:"config.openshift.io/v1", Kind:"ClusterServiceVersion", Name:"rhsso-operator.7.6.9-opr-002", UID:"00f0a902-a305-40bd-b277-2de22dca78ba", Controller:(*bool)(0xc1014fb039), BlockOwnerDeletion:(*bool)(nil)}, v1.OwnerReference{APIVersion:"config.openshift.io/v1", Kind:"ClusterVersion", Name:"version", UID:"6412f9f6-7ecf-4bfc-8277-813c9a4ef48d", Controller:(*bool)(0xc1014fb03a), BlockOwnerDeletion:(*bool)(nil)}}: Only one reference can have Controller set to true. Found "true" in references for ClusterServiceVersion/rhsso-operator.7.6.9-opr-002 and ClusterVersion/version)
      

      The culprit was found to be a rogue controller ownerReference on the ClusterServiceVersion CRD:

      $ oc --context mg get crd clusterserviceversions.operators.coreos.com -o yaml | yq .metadata.ownerReferences
      - apiVersion: config.openshift.io/v1
        controller: true
        kind: ClusterServiceVersion
        name: rhsso-operator.7.6.9-opr-002
        uid: 00f0a902-a305-40bd-b277-2de22dca78ba
      

      No matter what put it there, CVO should just stomp it instead of wedging on it.

      Version-Release number of selected component (if applicable):

      Update from 4.15.21 to 4.16.8 but likely master is affected too

      How reproducible:

      Haven't tried, likely deterministic

      Steps to Reproduce:

      1. Manually put an ownerReference with controller: true on a CRD owned by CVO (like the CSV one), likely doesn't even need to be while updating
      2. Eventually CVO should choke on it and start Failing=True

      Actual results:

      Could not update customresourcedefinition "clusterserviceversions.operators.coreos.com" (649 of 903): the object is invalid, possibly due to local cluster configuration
      

      Expected results:

      CVO overwrites the manual change with whatever is in the payload

              Unassigned Unassigned
              afri@afri.cz Petr Muller
              Jia Liu Jia Liu
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: