Description of problem:
When updating OpenShift Container Platform 4.11 to 4.12 with NFD Operator, the NFD Operator will stuck with the required update and fail with the same with the below error reported in the Subscription. $ oc get Subscription -n openshift-nfd nfd -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: creationTimestamp: "2023-03-08T15:05:16Z" generation: 1 labels: operators.coreos.com/nfd.openshift-nfd: "" name: nfd namespace: openshift-nfd resourceVersion: "7149045" uid: 0d744118-8568-4c31-8984-cbdcb4cce971 spec: channel: stable installPlanApproval: Automatic name: nfd source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: nfd.4.12.0-202302280915 status: catalogHealth: - catalogSourceRef: apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource name: certified-operators namespace: openshift-marketplace resourceVersion: "7143592" uid: 52d3b288-4042-43a4-9d4d-0bd0dd59b203 healthy: true lastUpdated: "2023-03-08T15:05:17Z" - catalogSourceRef: apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource name: community-operators namespace: openshift-marketplace resourceVersion: "7145481" uid: c303d9f0-7856-4394-bae1-807c4a8972a3 healthy: true lastUpdated: "2023-03-08T15:05:17Z" - catalogSourceRef: apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource name: redhat-marketplace namespace: openshift-marketplace resourceVersion: "7143591" uid: 6a373335-8fea-4323-a872-83af6ea21322 healthy: true lastUpdated: "2023-03-08T15:05:17Z" - catalogSourceRef: apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource name: redhat-operators namespace: openshift-marketplace resourceVersion: "7141365" uid: c4f2102b-cd2f-4e32-8793-f409c7e3ad04 healthy: true lastUpdated: "2023-03-08T15:05:17Z" conditions: - lastTransitionTime: "2023-03-08T15:05:17Z" message: all available catalogsources are healthy reason: AllCatalogSourcesHealthy status: "False" type: CatalogSourcesUnhealthy - lastTransitionTime: "2023-03-08T15:06:22Z" message: 'error validating existing CRs against new CRD''s schema for "nodefeaturediscoveries.nfd.openshift.io": error validating custom resource against new schema for NodeFeatureDiscovery openshift-nfd/nfd-instance: [[].status.conditions[0].message: Required value, [].status.conditions[0].reason: Required value, [].status.conditions[1].message: Required value, [].status.conditions[1].reason: Required value, [].status.conditions[2].message: Required value, [].status.conditions[2].reason: Required value, [].status.conditions[3].message: Required value, [].status.conditions[3].reason: Required value]' reason: InstallComponentFailed status: "True" type: InstallPlanFailed currentCSV: nfd.4.12.0-202302280915 installPlanGeneration: 1 installPlanRef: apiVersion: operators.coreos.com/v1alpha1 kind: InstallPlan name: install-xlnk6 namespace: openshift-nfd resourceVersion: "7148053" uid: 881d42d8-1846-45ef-9e43-a5a64771ca4f installedCSV: nfd.4.12.0-202302280915 installplan: apiVersion: operators.coreos.com/v1alpha1 kind: InstallPlan name: install-xlnk6 uuid: 881d42d8-1846-45ef-9e43-a5a64771ca4f lastUpdated: "2023-03-08T15:06:22Z" state: AtLatestKnown Removing and re-installing the NFD Operator does not help as the CRD validation continues to fail. Hence removing the affected CRD is probably the only approach to recover this but that will impact potentially the workload and is not something we recommend doing. So it's key to understand why this is happening and to have an approach to fix that without requiring customers to even remove their custom resources and custom resource definitions.
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.12
How reproducible:
Always
Steps to Reproduce:
1. Install OpenShift Container Platform 4.11 and NFD Operator from Red Hat 2. Update to OpenShift Container Platform 4.12 and see how the NFD Operator update is stuck
Actual results:
error validating existing CRs against new CRD's schema for "nodefeaturediscoveries.nfd.openshift.io": error validating custom resource against new schema for NodeFeatureDiscovery openshift-nfd/nfd-instance: [[].status.conditions[0].message: Required value, [].status.conditions[0].reason: Required value, [].status.conditions[1].message: Required value, [].status.conditions[1].reason: Required value, [].status.conditions[2].message: Required value, [].status.conditions[2].reason: Required value, [].status.conditions[3].message: Required value, [].status.conditions[3].reason: Required value]
Expected results:
Update to just work without manual intervention required by the platform engineer
Additional info:
- depends on
-
OCPBUGS-13671 Node Feature Discovery Operator is failing to update from OpenShift Container Platform 4.11 to 4.12
- Closed
- is cloned by
-
OCPBUGS-13671 Node Feature Discovery Operator is failing to update from OpenShift Container Platform 4.11 to 4.12
- Closed
- links to