-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.15.0
-
No
-
False
-
-
N/A: same version fix of API typing
-
Release Note Not Required
-
In Progress
Description of problem:
Operator condition available=False during upgrade (4.14.0-rc.7 to 4.15.0-0.nightly-2023-10-24-230302) with below error status: conditions: - lastTransitionTime: "2023-10-25T02:05:07Z" message: Working towards 4.15.0-0.nightly-2023-10-24-230302 status: "True" type: Progressing - lastTransitionTime: "2023-10-25T02:06:36Z" status: "False" type: Degraded - lastTransitionTime: "2023-10-25T02:05:54Z" message: 'Cluster not available for [{operator 4.14.0-rc.7}]: ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: [status.controllerCertificates[0].notAfter: Required value, status.controllerCertificates[0].notBefore: Required value, status.controllerCertificates[1].notAfter: Required value, status.controllerCertificates[1].notBefore: Required value, status.controllerCertificates[2].notAfter: Required value, status.controllerCertificates[2].notBefore: Required value, status.controllerCertificates[3].notAfter: Required value, status.controllerCertificates[3].notBefore: Required value, status.controllerCertificates[4].notAfter: Required value, status.controllerCertificates[4].notBefore: Required value, status.controllerCertificates[5].notAfter: Required value, status.controllerCertificates[5].notBefore: Required value, status.controllerCertificates[6].notAfter: Required value, status.controllerCertificates[6].notBefore: Required value, status.controllerCertificates[7].notAfter: Required value, status.controllerCertificates[7].notBefore: Required value, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]' reason: MachineConfigControllerFailed status: "False" type: Available - lastTransitionTime: "2023-10-25T02:06:43Z" message: One or more machine config pools are degraded, please see `oc get mcp` for further details and resolve before upgrading reason: DegradedPool status: "False" type: Upgradeable extension: master: 'pool is degraded because nodes fail with "2 nodes are reporting degraded status on sync": "Node ip-10-0-68-187.us-east-2.compute.internal is reporting: \"open /etc/docker/certs.d: no such file or directory\", Node ip-10-0-51-97.us-east-2.compute.internal is reporting: \"open /etc/docker/certs.d: no such file or directory\""' worker: all 2 nodes are at latest configuration rendered-worker-f9ff345da4f6740868e7238d5f820b65 Finally the upgrade can be completed successfully
Version-Release number of selected component (if applicable):
4.15.0-0.nightly-2023-10-24-230302
How reproducible:
upgrade cluster from 4.14 to 4.15 nightly
Steps to Reproduce:
1. setup 4.14 cluster 2. upgrade the cluster to 4.15 nightly 3. during the upgrade check the operator status
Actual results:
Condition Available=False with validation error of controllerCertificates
Expected results:
No validation error
Additional info:
struct of ControllerCertificate in 4.14 branch type ControllerCertificate struct { // subject is the cert subject Subject string `json:"subject"` // signer is the cert Issuer Signer string `json:"signer"` // bundleFile is the larger bundle a cert comes from BundleFile string `json:"bundleFile"` } struct of ControllerCertificate in master branch (4.15) type ControllerCertificate struct { // subject is the cert subject // +kubebuilder:validation:Required Subject string `json:"subject"` // signer is the cert Issuer // +kubebuilder:validation:Required Signer string `json:"signer"` // notBefore is the lower boundary for validity // +kubebuilder:validation:Required NotBefore *metav1.Time `json:"notBefore"` // notAfter is the upper boundary for validity // +kubebuilder:validation:Required NotAfter *metav1.Time `json:"notAfter"` // bundleFile is the larger bundle a cert comes from // +kubebuilder:validation:Required BundleFile string `json:"bundleFile"` }
related PR https://github.com/openshift/machine-config-operator/pull/3747
must-gather: https://drive.google.com/file/d/1L3pu8Jj3zGBx7m3elve5WzOvGRPYugnA/view?usp=drive_link
- is related to
-
OCPBUGS-24228 machine-config ClusterOperator should not blip Available=False on brief missing HTTP content-type
- Closed
- relates to
-
OTA-362 CI: fail update suite if any ClusterOperator go Available=False
- Closed
-
TRT-1334 Machine Config Operator controllerCertificates Validation Error
- Closed
- links to
-
RHSA-2023:7198 OpenShift Container Platform 4.15 security update