Description of problem:
When updating OpenShift Container Platform 4.11 to 4.12 with NFD Operator, the NFD Operator will stuck with the required update and fail with the same with the below error reported in the Subscription.
$ oc get Subscription -n openshift-nfd nfd -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
creationTimestamp: "2023-03-08T15:05:16Z"
generation: 1
labels:
operators.coreos.com/nfd.openshift-nfd: ""
name: nfd
namespace: openshift-nfd
resourceVersion: "7149045"
uid: 0d744118-8568-4c31-8984-cbdcb4cce971
spec:
channel: stable
installPlanApproval: Automatic
name: nfd
source: redhat-operators
sourceNamespace: openshift-marketplace
startingCSV: nfd.4.12.0-202302280915
status:
catalogHealth:
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: certified-operators
namespace: openshift-marketplace
resourceVersion: "7143592"
uid: 52d3b288-4042-43a4-9d4d-0bd0dd59b203
healthy: true
lastUpdated: "2023-03-08T15:05:17Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: community-operators
namespace: openshift-marketplace
resourceVersion: "7145481"
uid: c303d9f0-7856-4394-bae1-807c4a8972a3
healthy: true
lastUpdated: "2023-03-08T15:05:17Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: redhat-marketplace
namespace: openshift-marketplace
resourceVersion: "7143591"
uid: 6a373335-8fea-4323-a872-83af6ea21322
healthy: true
lastUpdated: "2023-03-08T15:05:17Z"
- catalogSourceRef:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: redhat-operators
namespace: openshift-marketplace
resourceVersion: "7141365"
uid: c4f2102b-cd2f-4e32-8793-f409c7e3ad04
healthy: true
lastUpdated: "2023-03-08T15:05:17Z"
conditions:
- lastTransitionTime: "2023-03-08T15:05:17Z"
message: all available catalogsources are healthy
reason: AllCatalogSourcesHealthy
status: "False"
type: CatalogSourcesUnhealthy
- lastTransitionTime: "2023-03-08T15:06:22Z"
message: 'error validating existing CRs against new CRD''s schema for "nodefeaturediscoveries.nfd.openshift.io":
error validating custom resource against new schema for NodeFeatureDiscovery
openshift-nfd/nfd-instance: [[].status.conditions[0].message: Required value,
[].status.conditions[0].reason: Required value, [].status.conditions[1].message:
Required value, [].status.conditions[1].reason: Required value, [].status.conditions[2].message:
Required value, [].status.conditions[2].reason: Required value, [].status.conditions[3].message:
Required value, [].status.conditions[3].reason: Required value]'
reason: InstallComponentFailed
status: "True"
type: InstallPlanFailed
currentCSV: nfd.4.12.0-202302280915
installPlanGeneration: 1
installPlanRef:
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
name: install-xlnk6
namespace: openshift-nfd
resourceVersion: "7148053"
uid: 881d42d8-1846-45ef-9e43-a5a64771ca4f
installedCSV: nfd.4.12.0-202302280915
installplan:
apiVersion: operators.coreos.com/v1alpha1
kind: InstallPlan
name: install-xlnk6
uuid: 881d42d8-1846-45ef-9e43-a5a64771ca4f
lastUpdated: "2023-03-08T15:06:22Z"
state: AtLatestKnown
Removing and re-installing the NFD Operator does not help as the CRD validation continues to fail.
Hence removing the affected CRD is probably the only approach to recover this but that will impact potentially the workload and is not something we recommend doing.
So it's key to understand why this is happening and to have an approach to fix that without requiring customers to even remove their custom resources and custom resource definitions.
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.12
How reproducible:
Always
Steps to Reproduce:
1. Install OpenShift Container Platform 4.11 and NFD Operator from Red Hat 2. Update to OpenShift Container Platform 4.12 and see how the NFD Operator update is stuck
Actual results:
error validating existing CRs against new CRD's schema for "nodefeaturediscoveries.nfd.openshift.io": error validating custom resource against new schema for NodeFeatureDiscovery openshift-nfd/nfd-instance: [[].status.conditions[0].message: Required value, [].status.conditions[0].reason: Required value, [].status.conditions[1].message: Required value, [].status.conditions[1].reason: Required value, [].status.conditions[2].message: Required value, [].status.conditions[2].reason: Required value, [].status.conditions[3].message: Required value, [].status.conditions[3].reason: Required value]
Expected results:
Update to just work without manual intervention required by the platform engineer
Additional info:
- depends on
-
OCPBUGS-13671 Node Feature Discovery Operator is failing to update from OpenShift Container Platform 4.11 to 4.12
-
- Closed
-
- is cloned by
-
OCPBUGS-13671 Node Feature Discovery Operator is failing to update from OpenShift Container Platform 4.11 to 4.12
-
- Closed
-
- links to